JP2017092950A

JP2017092950A - Information processing apparatus, conference system, information processing method, and program

Info

Publication number: JP2017092950A
Application number: JP2016201513A
Authority: JP
Inventors: 未来袴谷; Miku Hakamatani; 高橋　仁人; Masahito Takahashi; 仁人高橋; 耕司桑田; Koji Kuwata; 清人五十嵐; Kiyoto Igarashi; 和紀北澤; Kazuki Kitazawa; 智幸後藤; Tomoyuki Goto; 宣正銀川; Nobumasa Gingawa
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2015-11-05
Filing date: 2016-10-13
Publication date: 2017-05-25

Abstract

PROBLEM TO BE SOLVED: To provide an information processing apparatus, conference system, information processing method, and program that are capable of grasping a state of a conference site in a system extracting a video region of a speaker.SOLUTION: An information processing apparatus for performing communication of video and sound with another information processing apparatus comprises: an input unit for accepting input of sound from a sound source; an imaging unit for imaging video of the sound source; a reception unit for receiving video and sound from the other information processing apparatus; an extraction unit that, when sound is input by the input unit, extracts a video region including a sound source generating the sound from the video imaged by the imaging unit to take the extracted video region as first video and, when sound is received from the other information processing apparatus by the reception unit, extracts a video region having a range wider than that of at least the first video from the video imaged by the imaging unit to take the extracted video region as second video; and a transmission unit for transmitting at least either of the first video and the second video to the other information processing apparatus.SELECTED DRAWING: Figure 4

Description

本発明は、情報処理装置、会議システム、情報処理方法およびプログラムに関する。 The present invention relates to an information processing apparatus, a conference system, an information processing method, and a program.

遠隔地との間で、インターネット等の通信ネットワークを介して遠隔会議を行うビデオ会議システムが普及している。このビデオ会議システムにおいては、遠隔会議（ビデオ会議）を行う参加者等の当事者の一方がいる会議室において、ビデオ会議システムの端末装置を用いて会議室の参加者等の画像（映像）を撮像し、かつ、発話による音声を入力し、映像データおよび音声データを相手方の端末装置に送信する。そして、相手方の会議室のディスプレイに表示し、かつ、スピーカにより音声出力して、実際の会議に近い状態で遠隔地間の会議を実現している。 Video conferencing systems that conduct remote conferences with remote locations via a communication network such as the Internet have become widespread. In this video conference system, an image (video) of a participant in the conference room is taken using a terminal device of the video conference system in a conference room where one of the parties such as a participant conducting a remote conference (video conference) exists. In addition, the voice of the utterance is input, and the video data and the voice data are transmitted to the other party's terminal device. And it displays on the display of the other party's conference room, and outputs a voice by a speaker, thereby realizing a conference between remote locations in a state close to the actual conference.

また、ビデオ会議システムでは、会議の参加者の音声を取得するためにマイクを使用し、映像を取得するためにカメラを使用している。ただし、カメラには画角があるため、カメラの画角外にいる参加者の映像は撮影することができない。この問題を解決するために、３６０度全方向を撮影することができるパノラマカメラを用いる方法が知られている。一方、マイクは、通常、無指向性のため、どの参加者から発話された音声なのか、すなわち、音声の方向を判別することができない。この問題を解決するため、マイクアレイを使用することにより、どの参加者から発話された音声なのか、すなわち、音声の方向を判別する方法が知られている。 In the video conference system, a microphone is used to acquire the voice of the conference participant, and a camera is used to acquire the video. However, since the camera has an angle of view, it is not possible to take pictures of participants outside the angle of view of the camera. In order to solve this problem, a method using a panoramic camera capable of photographing 360 degrees in all directions is known. On the other hand, since the microphone is usually omnidirectional, it is impossible to determine from which participant the voice is spoken, that is, the direction of the voice. In order to solve this problem, a method is known in which a microphone array is used to determine from which participant the voice is spoken, that is, the direction of the voice.

このような、ビデオ会議システムとして、反射鏡を用いた全方位型カメラモジュールを用いて撮影した画像から、マイクアレイを使って音声の方向を特定した部分に対応する画像をデジタル的に切り出して表示する技術が開示されている（特許文献１）。 As such a video conferencing system, an image corresponding to a part in which the direction of sound is specified using a microphone array is digitally cut out and displayed from an image taken using an omnidirectional camera module using a reflecting mirror. The technique to do is disclosed (patent document 1).

特許文献１に記載された技術では、自拠点の参加者が発話している場合に、相手拠点の表示装置で、自拠点の発話している参加者を切り替えて表示するものとしている。しかしながら、自拠点の参加者が話していない場合、相手拠点では、自拠点の参加者の状態をどのように表示させるのかについて規定されておらず、自拠点の全体的な状態が把握できないという問題点がある。 In the technique described in Patent Document 1, when a participant at his / her own site speaks, the participant who speaks at his / her own site is switched and displayed on the display device at the other site. However, if the participant at his / her site is not speaking, the partner site does not specify how to display the status of the participant at his / her own site, and the overall state of his / her own site cannot be grasped. There is a point.

本発明は、上記に鑑みてなされたものであって、発話者の映像領域を切り出すシステムにおいても、会議の拠点の状態が把握できる情報処理装置、会議システム、情報処理方法およびプログラムを提供することを目的とする。 The present invention has been made in view of the above, and provides an information processing apparatus, a conference system, an information processing method, and a program capable of grasping the state of a conference base even in a system that cuts out a video area of a speaker. With the goal.

上述した課題を解決し、目的を達成するために、本発明は、他の情報処理装置と映像および音を通信する情報処理装置であって、音源の音の入力を受け付ける入力部と、前記音源の映像を撮像する撮像部と、前記他の情報処理装置から映像および音を受信する受信部と、前記入力部により音が入力されている場合、前記撮像部により撮像された映像から、該音を発した音源を含む映像領域を切り出して第１映像とし、前記受信部により前記他の情報処理装置から音が受信されている場合、前記撮像部により撮像された映像から、少なくとも前記第１映像よりも広い範囲の映像領域を切り出して第２映像とする切出部と、前記第１映像および前記第２映像のうち少なくともいずれかを前記他の情報処理装置に送信する送信部と、を備えたことを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention is an information processing apparatus that communicates video and sound with another information processing apparatus, an input unit that receives sound input of a sound source, and the sound source When a sound is input from the input unit, an image capturing unit that captures the image of the image, a receiving unit that receives the image and sound from the other information processing apparatus, and the sound input from the image captured by the image capturing unit When a video region including a sound source that emits sound is cut out to be a first video and sound is received from the other information processing apparatus by the receiving unit, at least the first video from the video captured by the imaging unit A cutout unit that cuts out a wider video area to form a second video, and a transmission unit that transmits at least one of the first video and the second video to the other information processing apparatus. That And butterflies.

本発明によれば、発話者の映像領域を切り出すシステムにおいても、会議の拠点の状態が把握できる。 According to the present invention, it is possible to grasp the state of a conference base even in a system that cuts out a video area of a speaker.

図１は、実施の形態に係る会議システムの全体構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of the overall configuration of a conference system according to an embodiment. 図２は、実施の形態に係る情報処理装置のハードウェア構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a hardware configuration of the information processing apparatus according to the embodiment. 図３は、実施の形態に係る情報処理装置の複数のマイクの配置およびパノラマカメラの配置の一例を示す図である。FIG. 3 is a diagram illustrating an example of the arrangement of a plurality of microphones and the arrangement of a panoramic camera of the information processing apparatus according to the embodiment. 図４は、実施の形態に係る情報処理装置の機能ブロック構成の一例を示す図である。FIG. 4 is a diagram illustrating an example of a functional block configuration of the information processing apparatus according to the embodiment. 図５は、会議に参加する参加者の配置例を示す図である。FIG. 5 is a diagram illustrating an arrangement example of participants who participate in the conference. 図６は、実施の形態に係る情報処理装置のモード決定処理の流れの一例を示すフローチャートである。FIG. 6 is a flowchart illustrating an example of a flow of mode determination processing of the information processing apparatus according to the embodiment. 図７は、各モードでの映像表示の概要を説明する図である。FIG. 7 is a diagram illustrating an overview of video display in each mode. 図８は、実施の形態に係る情報処理装置の２画面切替モードにおける映像切り出し動作の流れの一例を示すフローチャートである。FIG. 8 is a flowchart illustrating an example of the flow of a video cut-out operation in the two-screen switching mode of the information processing apparatus according to the embodiment. 図９は、実施の形態に係る情報処理装置の２画面切替モードにおける画面遷移の例を示す図である。FIG. 9 is a diagram illustrating an example of screen transition in the two-screen switching mode of the information processing apparatus according to the embodiment. 図１０は、ビデオ会議の参加者の全体の映像を切り出す切出範囲の例を説明する図である。FIG. 10 is a diagram for explaining an example of a cut-out range for cutting out the entire video of the participant in the video conference. 図１１は、３拠点以上の間でビデオ会議を行う場合の映像表示の例を示す図である。FIG. 11 is a diagram illustrating an example of video display when a video conference is performed between three or more locations.

以下に、図１〜１１を参照しながら、本発明に係る情報処理装置、会議システム、情報処理方法およびプログラムの実施の形態を詳細に説明する。また、以下の実施の形態によって本発明が限定されるものではなく、以下の実施の形態における構成要素には、当業者が容易に想到できるもの、実質的に同一のもの、およびいわゆる均等の範囲のものが含まれる。さらに、以下の実施の形態の要旨を逸脱しない範囲で構成要素の種々の省略、置換、変更および組み合わせを行うことができる。 Hereinafter, embodiments of an information processing apparatus, a conference system, an information processing method, and a program according to the present invention will be described in detail with reference to FIGS. In addition, the present invention is not limited by the following embodiments, and constituent elements in the following embodiments are easily conceivable by those skilled in the art, substantially the same, and so-called equivalent ranges. Is included. Furthermore, various omissions, substitutions, changes, and combinations of the constituent elements can be made without departing from the scope of the following embodiments.

（会議システムの構成）
図１は、実施の形態に係る会議システムの全体構成の一例を示す図である。図１を参照しながら、本実施の形態に係る会議システム１の構成について説明する。 (Conference system configuration)
FIG. 1 is a diagram illustrating an example of the overall configuration of a conference system according to an embodiment. The configuration of the conference system 1 according to the present embodiment will be described with reference to FIG.

図１に示すように、本実施の形態に係る会議システム１は、２以上の情報処理装置（情報処理装置１０ａ、１０ｂ、・・・）と、会議サーバ２０と、を含む。情報処理装置１０ａ、１０ｂは、それぞれインターネット等のネットワーク２を介して、互いに通信可能であり、かつ、会議サーバ２０と通信可能となっている。なお、図１に示す２以上の情報処理装置（１０ａ、１０ｂ、・・・）について、任意の情報処理装置を示す場合、または総称する場合、単に「情報処理装置１０」と称するものとする。また、図１では、拠点Ａに情報処理装置１０ａが設置され、拠点Ｂに情報処理装置１０ｂが設置された例を示している。 As shown in FIG. 1, the conference system 1 according to the present embodiment includes two or more information processing devices (information processing devices 10a, 10b,...) And a conference server 20. The information processing apparatuses 10 a and 10 b can communicate with each other via the network 2 such as the Internet, and can communicate with the conference server 20. Note that the two or more information processing devices (10a, 10b,...) Illustrated in FIG. 1 are simply referred to as “information processing device 10” when referring to or collectively referring to any information processing device. 1 shows an example in which the information processing apparatus 10a is installed at the site A and the information processing apparatus 10b is installed at the site B.

情報処理装置１０は、他の情報処理装置との間で、会議サーバ２０の制御に基づいて、セッションを確立し、確立したセッションを介して、音声データおよび映像データを送受信する会議端末装置である。これにより、会議システム１において、複数の情報処理装置（１０ａ、１０ｂ、・・・）間のビデオ会議（以下、単に「会議」という場合がある）が実現される。 The information processing apparatus 10 is a conference terminal apparatus that establishes a session with another information processing apparatus based on the control of the conference server 20 and transmits / receives audio data and video data via the established session. . Thereby, in the conference system 1, a video conference (hereinafter sometimes simply referred to as “conference”) between a plurality of information processing devices (10a, 10b,...) Is realized.

会議サーバ２０は、各情報処理装置１０が会議サーバ２０と接続しているか否かのモニタリング、会議開始時の各情報処理装置１０の呼び出し制御、および会議時の情報処理の制御を行うサーバ装置である。 The conference server 20 is a server device that performs monitoring of whether or not each information processing device 10 is connected to the conference server 20, control of calling each information processing device 10 at the start of the conference, and control of information processing at the time of the conference. is there.

（情報処理装置のハードウェア構成）
図２は、実施の形態に係る情報処理装置のハードウェア構成の一例を示す図である。図３は、実施の形態に係る情報処理装置の複数のマイクの配置およびパノラマカメラの配置の一例を示す図である。図２および３を参照しながら、本実施の形態に係る情報処理装置１０のハードウェア構成の詳細について説明する。 (Hardware configuration of information processing device)
FIG. 2 is a diagram illustrating an example of a hardware configuration of the information processing apparatus according to the embodiment. FIG. 3 is a diagram illustrating an example of the arrangement of a plurality of microphones and the arrangement of a panoramic camera of the information processing apparatus according to the embodiment. Details of the hardware configuration of the information processing apparatus 10 according to the present embodiment will be described with reference to FIGS.

図２に示すように、本実施の形態に係る情報処理装置１０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０１と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２０２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０３と、補助記憶装置２０４と、メディアドライブ２０５と、操作ボタン２０６と、電源スイッチ２０７と、ネットワークＩ／Ｆ２０８と、撮像素子Ｉ／Ｆ２０９と、パノラマカメラ２１０と、音声Ｉ／Ｆ２１１と、マイクアレイ２１２と、スピーカ２１３と、出力Ｉ／Ｆ２１４と、外部機器Ｉ／Ｆ２１６と、を備えている。 As shown in FIG. 2, the information processing apparatus 10 according to the present embodiment includes a CPU (Central Processing Unit) 201, a ROM (Read Only Memory) 202, a RAM (Random Access Memory) 203, and an auxiliary storage device 204. A media drive 205, an operation button 206, a power switch 207, a network I / F 208, an image sensor I / F 209, a panoramic camera 210, an audio I / F 211, a microphone array 212, a speaker 213, An output I / F 214 and an external device I / F 216 are provided.

ＣＰＵ２０１は、情報処理装置１０全体の動作を制御する集積回路である。ＲＯＭ２０２は、情報処理装置１０用のファームウェア等のプログラムを記憶している不揮発性の記憶装置である。ＲＡＭ２０３は、ＣＰＵ２０１のワークエリアとして使用される揮発性の記憶装置である。 The CPU 201 is an integrated circuit that controls the operation of the entire information processing apparatus 10. The ROM 202 is a non-volatile storage device that stores programs such as firmware for the information processing apparatus 10. The RAM 203 is a volatile storage device used as a work area for the CPU 201.

補助記憶装置２０４は、情報処理装置１０の動作を実現する各種プログラム、ならびに映像データおよび音声データ等の各種データを記憶する不揮発性の記憶装置である。補助記憶装置２０４は、例えば、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）またはＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等である。 The auxiliary storage device 204 is a non-volatile storage device that stores various programs for realizing the operation of the information processing apparatus 10 and various data such as video data and audio data. The auxiliary storage device 204 is, for example, an HDD (Hard Disk Drive) or an SSD (Solid State Drive).

メディアドライブ２０５は、フラッシュメモリ等である記録メディア２０５ａに対するデータの読み出しおよび書き込みを制御する装置である。 The media drive 205 is a device that controls reading and writing of data with respect to a recording medium 205a such as a flash memory.

操作ボタン２０６は、情報処理装置１０に対する設定操作等を行うためのボタンである。電源スイッチ２０７は、情報処理装置１０の電源のＯＮ／ＯＦＦを切り替えるスイッチである。 The operation button 206 is a button for performing a setting operation or the like for the information processing apparatus 10. The power switch 207 is a switch for switching on / off the power of the information processing apparatus 10.

ネットワークＩ／Ｆ２０８は、ネットワーク２を利用してデータを通信するためのインターフェースである。ネットワークＩ／Ｆ２０８は、例えば、ＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ）等である。撮像素子Ｉ／Ｆ２０９は、ＣＰＵ２０１の制御に従って被写体を撮像して映像データを得るパノラマカメラ２１０との間で映像データを伝送するためのインターフェースである。 The network I / F 208 is an interface for communicating data using the network 2. The network I / F 208 is, for example, a NIC (Network Interface Card). The image sensor I / F 209 is an interface for transmitting video data to and from the panoramic camera 210 that captures a subject and obtains video data under the control of the CPU 201.

パノラマカメラ２１０は、レンズ、および光を電荷に変換して被写体の画像（映像）をデジタルデータ化する固体撮像素子を含む撮像装置である。パノラマカメラ２１０は、周囲３６０度の映像データを取得する。このように、３６０度の映像データを取得することにより、情報処理装置１０の周囲にいる会議に参加する参加者を全て撮像することが可能となる。パノラマカメラ２１０は、撮像素子Ｉ／Ｆ２０９に接続される。固体撮像素子としては、ＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）またはＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）等が用いられる。また、パノラマカメラ２１０は、例えば、図３に示すように、情報処理装置１０の上面の中央部等に設置される。 The panoramic camera 210 is an imaging device that includes a lens and a solid-state imaging device that converts light into electric charges and converts an object image (video) into digital data. The panoramic camera 210 acquires video data of 360 degrees around. As described above, by acquiring 360-degree video data, it is possible to image all the participants participating in the conference around the information processing apparatus 10. The panoramic camera 210 is connected to the image sensor I / F 209. A CMOS (Complementary Metal Oxide Semiconductor) or a CCD (Charge Coupled Device) is used as the solid-state imaging device. Further, the panoramic camera 210 is installed, for example, at the center of the upper surface of the information processing apparatus 10 as shown in FIG.

音声Ｉ／Ｆ２１１は、ＣＰＵ２０１の制御に従って、音声を入力するマイクアレイ２１２および音声を出力するスピーカ２１３との間で音声信号の入出力を処理するインターフェースである。マイクアレイ２１２は、会議に参加している参加者の音声を入力する集音装置である。マイクアレイ２１２は、複数のマイクを有し、ＣＰＵ２０１の制御に従って、例えば、会議の参加者が発話した音声の方向を判別することができる。スピーカ２１３は、ＣＰＵ２０１の制御に従って、音声を出力する装置である。マイクアレイ２１２およびスピーカ２１３は、それぞれ音声Ｉ／Ｆ２１１に接続される。また、マイクアレイ２１２は、例えば、図３に示すように、６つのマイク（２１２ａ〜２１２ｆ）を有する。マイク２１２ａ〜２１２ｆは、例えば、図３に示すように、情報処理装置１０の筐体の上面に分散して配置される。マイクアレイ２１２は、例えば、マイク２１２ａ〜２１２ｆそれぞれに音声が入力するタイミングのずれ等に基づいて、音声の方向を判別することができる。なお、マイクアレイ２１２は、図３に示すように６つのマイクで構成されることに限定されるものではなく、複数のマイクを有するものであればよい。また、マイクアレイ２１２の各マイクは、図３に示すように、情報処理装置１０の筐体に分散して配置されるものとしているが、これに限定されるものではなく、マイク２１２ａ〜２１２ｆを有するマイクアレイ２１２のユニットが、情報処理装置１０の筐体とは別体として構成されているものとしてもよい。 The audio I / F 211 is an interface that processes input / output of audio signals between the microphone array 212 that inputs audio and the speaker 213 that outputs audio, under the control of the CPU 201. The microphone array 212 is a sound collection device that inputs the voices of participants participating in the conference. The microphone array 212 includes a plurality of microphones, and can determine the direction of voice spoken by a conference participant, for example, under the control of the CPU 201. The speaker 213 is a device that outputs sound in accordance with the control of the CPU 201. The microphone array 212 and the speaker 213 are connected to the audio I / F 211, respectively. Moreover, the microphone array 212 includes, for example, six microphones (212a to 212f) as illustrated in FIG. For example, as illustrated in FIG. 3, the microphones 212 a to 212 f are distributed on the upper surface of the housing of the information processing apparatus 10. The microphone array 212 can determine the direction of sound based on, for example, a difference in timing at which sound is input to each of the microphones 212a to 212f. Note that the microphone array 212 is not limited to being configured with six microphones as shown in FIG. 3, and may be any one having a plurality of microphones. Further, as shown in FIG. 3, each microphone of the microphone array 212 is arranged in a distributed manner in the housing of the information processing apparatus 10, but is not limited to this, and the microphones 212 a to 212 f are included. The unit of the microphone array 212 that is included may be configured separately from the housing of the information processing apparatus 10.

出力Ｉ／Ｆ２１４は、ＣＰＵ２０１の制御に従って、外付けの表示装置２１５に映像データを伝送するためのインターフェースである。外部機器接続Ｉ／Ｆ２１６は、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）ケーブル等によって、外付けカメラ、外付けマイクおよび外付けスピーカ等の外部機器がそれぞれ電気的に接続可能なインターフェースである。 The output I / F 214 is an interface for transmitting video data to the external display device 215 under the control of the CPU 201. The external device connection I / F 216 is an interface through which external devices such as an external camera, an external microphone, and an external speaker can be electrically connected by a USB (Universal Serial Bus) cable or the like.

表示装置２１５は、会議に参加している他拠点の参加者の映像を表示する表示装置である。表示装置２１５は、例えば、ＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）ディスプレイ、ＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ：液晶ディスプレイ）、または有機ＥＬ（ＯｒｇａｎｉｃＥｌｅｃｔｒｏ−Ｌｕｍｉｎｅｓｃｅｎｃｅ）ディスプレイ等である。表示装置２１５は、ケーブル２１４ａによって出力Ｉ／Ｆ２１４に接続される。ケーブル２１４ａは、アナログＲＧＢ（ＶＧＡ）信号用のケーブルであってもよく、コンポーネントビデオ用のケーブルであってもよく、ＨＤＭＩ（登録商標）（Ｈｉｇｈ−ＤｅｆｉｎｉｔｉｏｎＭｕｌｔｉｍｅｄｉａＩｎｔｅｒｆａｃｅ）またはＤＶＩ（ＤｉｇｉｔａｌＶｉｄｅｏＩｎｔｅｒａｃｔｉｖｅ）信号用のケーブルであってもよい。 The display device 215 is a display device that displays videos of participants at other sites participating in the conference. The display device 215 is, for example, a CRT (Cathode Ray Tube) display, an LCD (Liquid Crystal Display), an organic EL (Organic Electro-Luminescence) display, or the like. The display device 215 is connected to the output I / F 214 by a cable 214a. The cable 214a may be an analog RGB (VGA) signal cable or a component video cable, and may be an HDMI (registered trademark) (High-Definition Multimedia Interface) or DVI (Digital Video Interactive) signal. It may be a cable for use.

上述のＣＰＵ２０１、ＲＯＭ２０２、ＲＡＭ２０３、補助記憶装置２０４、メディアドライブ２０５、操作ボタン２０６、電源スイッチ２０７、ネットワークＩ／Ｆ２０８、撮像素子Ｉ／Ｆ２０９、音声Ｉ／Ｆ２１１、出力Ｉ／Ｆ２１４、および外部機器Ｉ／Ｆ２１６は、アドレスバスおよびデータバス等のバス２１７によって互いに通信可能に接続されている。 CPU 201, ROM 202, RAM 203, auxiliary storage device 204, media drive 205, operation button 206, power switch 207, network I / F 208, image sensor I / F 209, audio I / F 211, output I / F 214, and external device I described above The / F216 is connected to be communicable with each other via a bus 217 such as an address bus and a data bus.

なお、情報処理装置１０のハードウェア構成は、図２に示す構成に限定されるものではない。例えば、メディアドライブ２０５は備えていなくてもよい。 Note that the hardware configuration of the information processing apparatus 10 is not limited to the configuration shown in FIG. For example, the media drive 205 may not be provided.

（情報処理装置の機能ブロック構成）
図４は、実施の形態に係る情報処理装置の機能ブロック構成の一例を示す図である。図５は、会議に参加する参加者の配置例を示す図である。図４および５を参照しながら、本実施の形態に係る情報処理装置１０の機能ブロック構成の詳細について説明する。 (Function block configuration of information processing device)
FIG. 4 is a diagram illustrating an example of a functional block configuration of the information processing apparatus according to the embodiment. FIG. 5 is a diagram illustrating an arrangement example of participants who participate in the conference. Details of the functional block configuration of the information processing apparatus 10 according to the present embodiment will be described with reference to FIGS. 4 and 5.

図４に示すように、本実施の形態に係る情報処理装置１０は、認識部１０１と、配置特定部１０２（第２特定部）と、切出部１０３と、方向特定部１０４（第１特定部）と、管理部１０５（切替部）と、指定部１０６と、送信部１０７と、受信部１０８と、撮像制御部１０９と、表示制御部１１０と、音声出力制御部１１１と、入力部１１２と、記憶部１１３と、操作部１１４と、通信部１１５と、撮像部１１６と、表示部１１７と、音声出力部１１８と、を有する。 As shown in FIG. 4, the information processing apparatus 10 according to the present embodiment includes a recognition unit 101, an arrangement specifying unit 102 (second specifying unit), a cutout unit 103, and a direction specifying unit 104 (first specifying unit). Section), management section 105 (switching section), designation section 106, transmission section 107, reception section 108, imaging control section 109, display control section 110, audio output control section 111, and input section 112. A storage unit 113, an operation unit 114, a communication unit 115, an imaging unit 116, a display unit 117, and an audio output unit 118.

認識部１０１は、撮像部１１６により撮像された画像に含まれる１以上の参加者の顔の部分を顔画像として認識（以下、「顔認識」という場合がある）する機能部である。認識部１０１による参加者の顔認識は、公知の顔認識の画像解析方法を用いるものとすればよい。撮像部１１６により撮像された画像から認識部１０１により顔認識された回数をカウントすることによって、会議の参加者の人数を知ることが可能となる。認識部１０１は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 The recognizing unit 101 is a functional unit that recognizes a face portion of one or more participants included in an image captured by the image capturing unit 116 as a face image (hereinafter sometimes referred to as “face recognition”). The recognition unit 101 may recognize a participant's face by using a known face recognition image analysis method. By counting the number of times the recognition unit 101 has recognized the face from the image captured by the imaging unit 116, the number of participants in the conference can be known. The recognition unit 101 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program.

なお、認識部１０１は参加者の顔の部分を顔画像として認識することによって参加者を認識するものとしたが、これに限定されるものではなく、人体検出による認識等のその他の認識方法によって参加者を認識するものとしてもよい。 Note that the recognition unit 101 recognizes the participant by recognizing the part of the participant's face as a face image. However, the recognition unit 101 is not limited to this, and other recognition methods such as recognition by human body detection are used. It is good also as what recognizes a participant.

配置特定部１０２は、撮像部１１６によって撮像された画像において、認識部１０１により顔認識された参加者の位置を特定し、会議に参加する参加者の会議室における参加者の配置パターンを特定する機能部である。例えば、図５（ａ）に示す会議室では、机４０の上に情報処理装置１０および表示装置２１５が載置され、机４０に載置された表示装置２１５の近傍（図５（ａ）の領域Ｐ７内）にホワイトボード５０が設置されているものとする。このような会議室の状態で、情報処理装置１０の撮像部１１６により３６０度全方向の画像（以下、「パノラマ画像」という場合がある）が撮像された場合、配置特定部１０２は、認識部１０１によって顔認識された参加者６０ａ〜６０ｅについて、参加者６０ａが領域Ｐ１に、参加者６０ｂが領域Ｐ２に、参加者６０ｃが領域Ｐ３に、参加者６０ｄが領域Ｐ５に、そして、参加者６０ｅが領域Ｐ６に位置することを示す配置パターンを特定する。また、図５（ｂ）では、撮像部１１６により撮像されたパノラマ画像において、配置特定部１０２は、認識部１０１によって顔認識された参加者６１ａ〜６１ｄについて、参加者６１ａが領域Ｐ２に、参加者６１ｂが領域Ｐ３に、参加者６１ｃが領域Ｐ４に、そして、参加者６１ｄが領域Ｐ５に位置することを示す配置パターンを特定する。すなわち、配置パターンを特定するということは、情報処理装置１０に対してどの方向に参加者が配置されているかを特定することになる。配置特定部１０２は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 The arrangement specifying unit 102 specifies the position of the participant whose face is recognized by the recognition unit 101 in the image captured by the imaging unit 116, and specifies the arrangement pattern of the participant in the conference room of the participant who participates in the conference. It is a functional part. For example, in the conference room shown in FIG. 5A, the information processing device 10 and the display device 215 are placed on the desk 40, and the vicinity of the display device 215 placed on the desk 40 (see FIG. 5A). It is assumed that the whiteboard 50 is installed in the area P7). In such a conference room state, when an image of 360 degrees omnidirectional (hereinafter, also referred to as “panoramic image”) is captured by the imaging unit 116 of the information processing apparatus 10, the arrangement specifying unit 102 recognizes the recognition unit. For the participants 60a to 60e whose faces are recognized by the user 101, the participant 60a is in the region P1, the participant 60b is in the region P2, the participant 60c is in the region P3, the participant 60d is in the region P5, and the participant 60e. An arrangement pattern indicating that is located in the region P6 is specified. In FIG. 5B, in the panoramic image captured by the image capturing unit 116, the arrangement specifying unit 102 participates in the region P2 with respect to the participants 61a to 61d whose faces are recognized by the recognition unit 101. An arrangement pattern indicating that the person 61b is located in the area P3, the participant 61c is located in the area P4, and the participant 61d is located in the area P5 is specified. That is, specifying the arrangement pattern specifies in which direction the participant is arranged with respect to the information processing apparatus 10. The arrangement specifying unit 102 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program.

なお、撮像部１１６により撮像された画像を「パノラマ画像」という場合があるものとしたが、撮像部１１６により撮像する客体として映像を示す場合、「パノラマ映像」という場合があるものとする。ただし、映像は、画像を含む概念であるものとする。 The image captured by the imaging unit 116 is sometimes referred to as a “panoramic image”. However, when an image is shown as an object captured by the imaging unit 116, it may be referred to as a “panoramic image”. However, the video is a concept including an image.

切出部１０３は、パノラマ映像から、認識部１０１により顔認識された特定の参加者の映像領域、または、会議に参加している参加者全員を含む映像領域を切り出す機能部である。切出部１０３は、パノラマ映像から特定の参加者の映像領域を切り出す場合、例えば、後述するように、方向特定部１０４によって特定された参加者（音源）の音声の方向に対応する映像領域を切り出すものとすればよい。切出部１０３は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 The cutout unit 103 is a functional unit that cuts out a video region of a specific participant whose face is recognized by the recognition unit 101 or a video region including all the participants participating in the conference from the panoramic video. When extracting the video area of a specific participant from the panoramic video, for example, the clipping unit 103 selects a video area corresponding to the audio direction of the participant (sound source) specified by the direction specifying unit 104 as described later. It should be cut out. The cutout unit 103 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program.

方向特定部１０４は、マイクアレイである入力部１１２により入力された音声の方向を特定する機能部である。具体的には、方向特定部１０４は、例えば、マイクアレイである入力部１１２を構成する複数のマイクに音声が入力するタイミングのずれ等に基づいて、音声の方向を特定する。方向特定部１０４は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 The direction specifying unit 104 is a functional unit that specifies the direction of audio input by the input unit 112 that is a microphone array. Specifically, the direction specifying unit 104 specifies the direction of audio based on, for example, a difference in timing at which audio is input to a plurality of microphones constituting the input unit 112 that is a microphone array. The direction specifying unit 104 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program.

管理部１０５は、情報処理装置１０の動作モードを管理する機能部である。具体的には、管理部１０５は、認識部１０１によりパノラマ画像から顔認識された回数をカウントすることによって、会議の参加者の人数を把握し、情報処理装置１０の動作モードを決定する。この情報処理装置１０の動作モードを決定する動作については、図６で後述する。管理部１０５は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 The management unit 105 is a functional unit that manages the operation mode of the information processing apparatus 10. Specifically, the management unit 105 counts the number of times the face is recognized from the panoramic image by the recognition unit 101, thereby grasping the number of participants in the conference and determining the operation mode of the information processing apparatus 10. The operation for determining the operation mode of the information processing apparatus 10 will be described later with reference to FIG. The management unit 105 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program.

指定部１０６は、切出部１０３により切り出された映像領域に対応する映像データを、他拠点の情報処理装置１０の表示部１１７のどの表示領域に表示させるかを指定する機能部である。具体的には、指定部１０６は、どの表示領域に表示させるかを指定する指定情報を生成する。指定部１０６は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 The designation unit 106 is a functional unit that designates in which display area of the display unit 117 of the information processing apparatus 10 at another base the video data corresponding to the video area cut out by the cutout unit 103 is to be displayed. Specifically, the designation unit 106 generates designation information for designating which display area to display. The designation unit 106 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program.

送信部１０７は、切出部１０３により切り出された映像領域に対応する映像データ、および、入力部１１２により入力された音声データを、通信部１１５およびネットワーク２を介して、他拠点の情報処理装置１０に送信する機能部である。具体的には、送信部１０７は、例えば、映像データおよび音声データをエンコードして、他拠点の情報処理装置１０に送信する。ここで、エンコードの方法としては、公知の方法を用いればよい。例えば、Ｈ．２６４／ＡＶＣ、またはＨ．２６４／ＳＶＣ等の圧縮符号化技術を用いればよい。送信部１０７は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 The transmission unit 107 transmits the video data corresponding to the video area cut out by the cut-out unit 103 and the audio data input by the input unit 112 via the communication unit 115 and the network 2 to information processing apparatuses at other bases. 10 is a functional unit that transmits data to 10. Specifically, for example, the transmission unit 107 encodes video data and audio data and transmits the encoded data to the information processing apparatus 10 at another site. Here, a known method may be used as the encoding method. For example, H.M. H.264 / AVC, or H.264. A compression coding technique such as H.264 / SVC may be used. The transmission unit 107 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program.

受信部１０８は、ネットワーク２および通信部１１５を介して、他拠点の情報処理装置１０から受信した映像データおよび音声データを受信する機能部である。具体的には、受信部１０８は、例えば、受信した映像データおよび音声データをデコードし、デコードした映像データを表示制御部１１０に送り、デコードした音声データを音声出力制御部１１１に送る。ここで、デコードの方法としては、公知の方法を用いればよい。受信部１０８は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 The receiving unit 108 is a functional unit that receives video data and audio data received from the information processing apparatus 10 at another site via the network 2 and the communication unit 115. Specifically, for example, the receiving unit 108 decodes received video data and audio data, sends the decoded video data to the display control unit 110, and sends the decoded audio data to the audio output control unit 111. Here, a known method may be used as a decoding method. The receiving unit 108 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program.

撮像制御部１０９は、撮像部１１６の動作を制御する機能部である。具体的には、撮像制御部１０９は、例えば、撮像部１１６による撮像の開始および停止の動作等を制御し、撮像部１１６により撮像されたパノラマ映像を取得する。撮像制御部１０９は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 The imaging control unit 109 is a functional unit that controls the operation of the imaging unit 116. Specifically, for example, the imaging control unit 109 controls the start and stop operations of imaging by the imaging unit 116 and acquires a panoramic image captured by the imaging unit 116. The imaging control unit 109 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program.

表示制御部１１０は、表示部１１７に各種画像を表示させる制御を行う機能部である。表示制御部１１０は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 The display control unit 110 is a functional unit that performs control to display various images on the display unit 117. The display control unit 110 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program.

音声出力制御部１１１は、音声出力部１１８に各種音声を出力させる制御を行う機能部である。音声出力制御部１１１は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 The audio output control unit 111 is a functional unit that controls the audio output unit 118 to output various types of audio. The audio output control unit 111 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program.

入力部１１２は、音声を入力する機能部である。入力部１１２は、例えば、図２に示すマイクアレイ２１２によって実現される。 The input unit 112 is a functional unit that inputs voice. The input unit 112 is realized by, for example, the microphone array 212 illustrated in FIG.

記憶部１１３は、情報処理装置１０の動作を実現する各種プログラム、映像データ、音声データ、および配置特定部１０２によって特定された配置パターン等の情報を記憶する機能部である。記憶部１１３は、例えば、図２に示すＲＡＭ２０３および補助記憶装置２０４によって実現される。 The storage unit 113 is a functional unit that stores various programs for realizing the operation of the information processing apparatus 10, video data, audio data, and information such as the arrangement pattern specified by the arrangement specifying unit 102. The storage unit 113 is realized by, for example, the RAM 203 and the auxiliary storage device 204 illustrated in FIG.

操作部１１４は、情報処理装置１０の利用者（例えば、会議の参加者）の各種操作入力を受け付ける機能部である。操作部１１４は、例えば、図２に示す操作ボタン２０６および電源スイッチ２０７等によって実現される。なお、操作部１１４は、図２に示す操作ボタン２０６および電源スイッチ２０７に限定されるものではなく、マウス、キーボード、またはタッチパネル等によって実現されるものとしてもよい。 The operation unit 114 is a functional unit that receives various operation inputs of a user of the information processing apparatus 10 (for example, a conference participant). The operation unit 114 is realized by, for example, the operation button 206 and the power switch 207 shown in FIG. Note that the operation unit 114 is not limited to the operation button 206 and the power switch 207 illustrated in FIG. 2, and may be realized by a mouse, a keyboard, a touch panel, or the like.

通信部１１５は、ネットワーク２を介して、他の情報処理装置１０、および会議サーバ２０とデータ通信をする機能部である。通信部１１５は、例えば、図２に示すネットワークＩ／Ｆ２０８によって実現される。 The communication unit 115 is a functional unit that performs data communication with the other information processing apparatus 10 and the conference server 20 via the network 2. The communication unit 115 is realized by, for example, the network I / F 208 shown in FIG.

撮像部１１６は、３６０度全方向のパノラマ画像またはパノラマ映像を撮像する機能部である。撮像部１１６は、例えば、図２に示すパノラマカメラ２１０によって実現される。 The imaging unit 116 is a functional unit that captures panoramic images or panoramic images in all directions of 360 degrees. The imaging unit 116 is realized by, for example, the panoramic camera 210 illustrated in FIG.

表示部１１７は、表示制御部１１０の制御に従って、各種画像を表示する機能部である。表示部１１７は、例えば、図２に示す表示装置２１５によって実現される。 The display unit 117 is a functional unit that displays various images under the control of the display control unit 110. The display unit 117 is realized by, for example, the display device 215 illustrated in FIG.

音声出力部１１８は、音声出力制御部１１１の制御に従って、各種音声を出力する機能部である。音声出力部１１８は、例えば、図２に示すスピーカ２１３によって実現される。 The sound output unit 118 is a functional unit that outputs various sounds according to the control of the sound output control unit 111. The audio output unit 118 is realized by, for example, the speaker 213 illustrated in FIG.

なお、図４に示す情報処理装置１０の認識部１０１、配置特定部１０２、切出部１０３、方向特定部１０４、管理部１０５、指定部１０６、送信部１０７、受信部１０８、撮像制御部１０９、表示制御部１１０、音声出力制御部１１１、入力部１１２、記憶部１１３、操作部１１４、通信部１１５、撮像部１１６、表示部１１７および音声出力部１１８は、機能を概念的に示したものであって、このような構成に限定されるものではない。例えば、図４に示す情報処理装置１０で独立した機能部として図示した複数の機能部を、１つの機能部として構成してもよい。一方、図４に示す情報処理装置１０で１つの機能部が有する機能を複数に分割し、複数の機能部として構成するものとしてもよい。 Note that the recognition unit 101, the arrangement specifying unit 102, the cutout unit 103, the direction specifying unit 104, the management unit 105, the specifying unit 106, the transmitting unit 107, the receiving unit 108, and the imaging control unit 109 of the information processing apparatus 10 illustrated in FIG. The display control unit 110, the audio output control unit 111, the input unit 112, the storage unit 113, the operation unit 114, the communication unit 115, the imaging unit 116, the display unit 117, and the audio output unit 118 are conceptually shown functions. However, it is not limited to such a configuration. For example, a plurality of functional units illustrated as independent functional units in the information processing apparatus 10 illustrated in FIG. 4 may be configured as one functional unit. On the other hand, in the information processing apparatus 10 illustrated in FIG. 4, the functions of one functional unit may be divided into a plurality of units and configured as a plurality of functional units.

また、情報処理装置１０の認識部１０１、配置特定部１０２、切出部１０３、方向特定部１０４、管理部１０５、指定部１０６、送信部１０７、受信部１０８、撮像制御部１０９、表示制御部１１０および音声出力制御部１１１の一部または全部は、ソフトウェアであるプログラムではなく、ＦＰＧＡ（Ｆｉｅｌｄ−ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）またはＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）等のハードウェア回路によって実現されてもよい。 Also, the recognition unit 101, the arrangement specifying unit 102, the cutout unit 103, the direction specifying unit 104, the management unit 105, the specifying unit 106, the transmitting unit 107, the receiving unit 108, the imaging control unit 109, and the display control unit of the information processing apparatus 10 A part or all of the 110 and the audio output control unit 111 may be realized by a hardware circuit such as an FPGA (Field-Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit) instead of a program that is software.

（モード決定処理）
図６は、実施の形態に係る情報処理装置のモード決定処理の流れの一例を示すフローチャートである。図７は、各モードでの映像表示の概要を説明する図である。図６および７を参照しながら、本実施の形態に係る情報処理装置１０のモード決定処理の流れについて説明する。なお、以下の説明では、自拠点の情報処理装置１０と、相手拠点の情報処理装置１０との２拠点間でのビデオ会議を想定して説明する。 (Mode decision processing)
FIG. 6 is a flowchart illustrating an example of a flow of mode determination processing of the information processing apparatus according to the embodiment. FIG. 7 is a diagram illustrating an overview of video display in each mode. The flow of the mode determination process of the information processing apparatus 10 according to the present embodiment will be described with reference to FIGS. In the following description, a video conference between two sites, that is, the information processing device 10 at the local site and the information processing device 10 at the other site is assumed.

＜ステップＳ１１＞
まず、会議に参加しようとする参加者は、情報処理装置１０の操作部１１４を操作して、情報処理装置１０の電源をＯＮ状態にし、撮像部１１６によって周囲の画像（パノラマ画像）を撮像するための操作入力を行う。撮像制御部１０９は、操作部１１４からパノラマ画像を撮像するための操作情報を受け取ると、撮像部１１６にパノラマ画像を撮像させる。そして、ステップＳ１２へ移行する。 <Step S11>
First, a participant who wants to participate in the conference operates the operation unit 114 of the information processing apparatus 10 to turn on the information processing apparatus 10 and captures a surrounding image (panoramic image) with the imaging unit 116. Operation input is performed. Upon receiving operation information for capturing a panoramic image from the operation unit 114, the imaging control unit 109 causes the imaging unit 116 to capture a panoramic image. Then, the process proceeds to step S12.

＜ステップＳ１２＞
認識部１０１は、撮像部１１６により撮像された画像に含まれる１以上の参加者の顔の部分を顔画像として認識（顔認識）する。そして、ステップＳ１３へ移行する。 <Step S12>
The recognition unit 101 recognizes (face recognition) a face part of one or more participants included in the image captured by the imaging unit 116 as a face image. Then, the process proceeds to step S13.

＜ステップＳ１３＞
配置特定部１０２は、撮像部１１６によって撮像されたパノラマ画像において、認識部１０１により顔認識された参加者の位置を特定し、会議に参加する参加者の会議室における参加者の配置パターンを特定する。配置特定部１０２は、特定した配置パターンの情報を、記憶部１１３に記憶させる。そして、ステップＳ１４へ移行する。 <Step S13>
The arrangement specifying unit 102 specifies the position of the participant whose face is recognized by the recognition unit 101 in the panoramic image captured by the imaging unit 116, and specifies the arrangement pattern of the participant in the conference room of the participant who participates in the conference. To do. The arrangement specifying unit 102 causes the storage unit 113 to store information on the specified arrangement pattern. Then, the process proceeds to step S14.

＜ステップＳ１４＞
管理部１０５は、認識部１０１によりパノラマ画像から顔認識された回数をカウントすることによって、会議の参加者の人数を把握する。会議の参加者の人数が１人である場合（ステップＳ１４：１人）、ステップＳ１５へ移行し、参加者の人数が２人である場合（ステップＳ１４：２人）、ステップＳ１６へ移行し、参加者の人数が３人以上である場合（ステップＳ１４：３人以上）、ステップＳ１７へ移行する。 <Step S14>
The management unit 105 grasps the number of participants in the conference by counting the number of times the recognition unit 101 has recognized the face from the panoramic image. When the number of participants in the conference is 1 (step S14: 1), the process proceeds to step S15. When the number of participants is 2 (step S14: 2), the process proceeds to step S16. When the number of participants is 3 or more (step S14: 3 or more), the process proceeds to step S17.

＜ステップＳ１５＞
管理部１０５は、会議の参加者の人数が１人であると判定した場合、情報処理装置１０の動作モードを１画面固定モード（第１動作モード）に切り替える。そして、情報処理装置１０は、モード決定処理を終了する。 <Step S15>
When determining that the number of participants in the conference is one, the management unit 105 switches the operation mode of the information processing apparatus 10 to the one-screen fixed mode (first operation mode). Then, the information processing apparatus 10 ends the mode determination process.

自拠点の情報処理装置１０は、１画面固定モードで動作する場合、自拠点での会議の参加者は１人なので、相手拠点の情報処理装置１０に対して、自拠点の１人の参加者の映像データおよび音声データを送信する。 When the information processing apparatus 10 at its own site operates in the single screen fixed mode, there is only one participant in the conference at its own site. Send video data and audio data.

具体的には、まず、自拠点の情報処理装置１０および相手拠点の情報処理装置１０は、互いに動作モードの情報を、ネットワーク２を介して交換する。ここでは、自拠点の情報処理装置１０は、１画面固定モードで動作することを示す情報を、ネットワーク２を介して相手拠点の情報処理装置１０に送信する。 Specifically, first, the information processing device 10 at the local site and the information processing device 10 at the other site exchange information on the operation mode with each other via the network 2. Here, the information processing apparatus 10 at its own site transmits information indicating that it operates in the one-screen fixed mode to the information processing apparatus 10 at the other site via the network 2.

自拠点の情報処理装置１０の切出部１０３は、撮像部１１６により撮像されたパノラマ映像から、認識部１０１により顔認識された１人の参加者の映像領域を切り出す。自拠点の情報処理装置１０の送信部１０７は、切出部１０３により切り出された映像領域の映像データ、および、入力部１１２により入力された音声データをエンコードして、通信部１１５およびネットワーク２を介して、相手拠点の情報処理装置１０に送信する。 The cutout unit 103 of the information processing apparatus 10 at the local site cuts out the video area of one participant whose face is recognized by the recognition unit 101 from the panoramic video imaged by the imaging unit 116. The transmission unit 107 of the information processing apparatus 10 at the local site encodes the video data of the video area cut out by the cut-out unit 103 and the audio data input by the input unit 112, and transmits the communication unit 115 and the network 2. To the information processing apparatus 10 at the partner site.

そして、相手拠点の情報処理装置１０の受信部１０８は、ネットワーク２および通信部１１５を介して、自拠点の情報処理装置１０から映像データおよび音声データを受信すると、その映像データおよび音声データをデコードする。相手拠点の情報処理装置１０の受信部１０８は、デコードした映像データを表示制御部１１０に送り、デコードした音声データを音声出力制御部１１１に送る。相手拠点の情報処理装置１０は、自拠点の情報処理装置１０が１画面固定モードで動作していることを認識しているので、相手拠点の情報処理装置１０の表示制御部１１０は、図７（ａ）に示すように、表示部１１７（表示装置２１５）の表示画面２１５ａにおける表示領域３００（特定の表示領域）を図７（ｂ）および（ｃ）のように分割しない。そして、相手拠点の情報処理装置１０の表示制御部１１０は、受信した映像データ（自拠点の情報処理装置１０の切出部１０３により切り出された自拠点の１人の参加者（図７（ａ）の例では参加者Ｘ）を含む映像領域の映像データ）を、図７（ａ）に示すように、表示領域３００に表示させる。また、相手拠点の情報処理装置１０の音声出力制御部１１１は、受信した音声データを音声出力部１１８に音声として出力させる。 When the receiving unit 108 of the information processing apparatus 10 at the partner site receives video data and audio data from the information processing apparatus 10 at its own site via the network 2 and the communication unit 115, the receiving unit 108 decodes the video data and audio data. To do. The receiving unit 108 of the information processing apparatus 10 at the partner site sends the decoded video data to the display control unit 110 and sends the decoded audio data to the audio output control unit 111. Since the information processing apparatus 10 at the partner site recognizes that the information processing apparatus 10 at the partner site is operating in the one-screen fixed mode, the display control unit 110 of the information processing apparatus 10 at the partner site displays FIG. As shown to (a), the display area 300 (specific display area) in the display screen 215a of the display part 117 (display device 215) is not divided | segmented like FIG.7 (b) and (c). Then, the display control unit 110 of the information processing apparatus 10 at the partner site receives the received video data (one participant at the local site extracted by the clipping unit 103 of the information processing apparatus 10 at the local site (FIG. 7A In the example of (), the video data of the video area including the participant X) is displayed on the display area 300 as shown in FIG. In addition, the audio output control unit 111 of the information processing apparatus 10 at the partner site causes the audio output unit 118 to output the received audio data as audio.

＜ステップＳ１６＞
管理部１０５は、会議の参加者の人数が２人であると判定した場合、情報処理装置１０の動作モードを２画面固定モード（第２動作モード）に切り替える。そして、情報処理装置１０は、モード決定処理を終了する。 <Step S16>
When the management unit 105 determines that the number of participants in the conference is two, the management unit 105 switches the operation mode of the information processing apparatus 10 to the two-screen fixed mode (second operation mode). Then, the information processing apparatus 10 ends the mode determination process.

具体的には、まず、自拠点の情報処理装置１０および相手拠点の情報処理装置１０は、互いに動作モードの情報を、ネットワーク２を介して交換する。ここでは、自拠点の情報処理装置１０は、２画面固定モードで動作することを示す情報を、ネットワーク２を介して相手拠点の情報処理装置１０に送信する。 Specifically, first, the information processing device 10 at the local site and the information processing device 10 at the other site exchange information on the operation mode with each other via the network 2. Here, the information processing apparatus 10 at its own site transmits information indicating that it operates in the two-screen fixed mode to the information processing apparatus 10 at the other site via the network 2.

自拠点の情報処理装置１０の切出部１０３は、撮像部１１６により撮像されたパノラマ映像から、認識部１０１により顔認識された２人の参加者それぞれの映像領域を切り出す。自拠点の情報処理装置１０の送信部１０７は、切出部１０３により切り出された映像領域の映像データ、および、入力部１１２により入力された音声データをエンコードして、通信部１１５およびネットワーク２を介して、相手拠点の情報処理装置１０に送信する。 The cutout unit 103 of the information processing apparatus 10 at the local site cuts out the video areas of the two participants whose faces are recognized by the recognition unit 101 from the panoramic video imaged by the imaging unit 116. The transmission unit 107 of the information processing apparatus 10 at the local site encodes the video data of the video area cut out by the cut-out unit 103 and the audio data input by the input unit 112, and transmits the communication unit 115 and the network 2. To the information processing apparatus 10 at the partner site.

そして、相手拠点の情報処理装置１０の受信部１０８は、ネットワーク２および通信部１１５を介して、自拠点の情報処理装置１０から映像データおよび音声データを受信すると、その映像データおよび音声データをデコードする。相手拠点の情報処理装置１０の受信部１０８は、デコードした映像データを表示制御部１１０に送り、デコードした音声データを音声出力制御部１１１に送る。相手拠点の情報処理装置１０は、自拠点の情報処理装置１０が２画面固定モードで動作していることを認識しているので、相手拠点の情報処理装置１０の表示制御部１１０は、図７（ｂ）に示すように、表示部１１７（表示装置２１５）の表示画面２１５ａにおける表示領域３００を２分割して、分割領域３００ａおよび分割領域３００ｂを生成する。そして、相手拠点の情報処理装置１０の表示制御部１１０は、受信した自拠点の２人の映像データ（自拠点の情報処理装置１０の切出部１０３により切り出された自拠点の２人の参加者（図７（ｂ）の例では参加者Ｘ、Ｙ）をそれぞれ含む映像領域の映像データ）を、図７（ｂ）に示すように、分割領域３００ａ、３００ｂにそれぞれ表示させる。また、相手拠点の情報処理装置１０の音声出力制御部１１１は、受信した音声データを音声出力部１１８に音声として出力させる。 When the receiving unit 108 of the information processing apparatus 10 at the partner site receives video data and audio data from the information processing apparatus 10 at its own site via the network 2 and the communication unit 115, the receiving unit 108 decodes the video data and audio data. To do. The receiving unit 108 of the information processing apparatus 10 at the partner site sends the decoded video data to the display control unit 110 and sends the decoded audio data to the audio output control unit 111. Since the information processing device 10 at the partner site recognizes that the information processing device 10 at the partner site is operating in the two-screen fixed mode, the display control unit 110 of the information processing device 10 at the partner site is shown in FIG. As shown in (b), the display area 300 on the display screen 215a of the display unit 117 (display device 215) is divided into two to generate a divided area 300a and a divided area 300b. Then, the display control unit 110 of the information processing device 10 at the partner site receives the received video data of the two people at the own site (the participation of the two people at the own site cut out by the cutting unit 103 of the information processing device 10 at the own site) (In the example of FIG. 7B, the video data of the video area including the participants X and Y) is displayed in the divided areas 300a and 300b, respectively, as shown in FIG. 7B. In addition, the audio output control unit 111 of the information processing apparatus 10 at the partner site causes the audio output unit 118 to output the received audio data as audio.

＜ステップＳ１７＞
管理部１０５は、会議の参加者の人数が３人以上であると判定した場合、情報処理装置１０の動作モードを２画面切替モード（第３動作モード）に切り替える。そして、情報処理装置１０は、モード決定処理を終了する。 <Step S17>
When the management unit 105 determines that the number of participants in the conference is three or more, the management unit 105 switches the operation mode of the information processing apparatus 10 to the two-screen switching mode (third operation mode). Then, the information processing apparatus 10 ends the mode determination process.

１画面固定モードおよび２画面固定モードと同様に、自拠点の情報処理装置１０および相手拠点の情報処理装置１０は、互いに動作モードの情報を、ネットワーク２を介して交換する。ここでは、自拠点の情報処理装置１０は、２画面切替モードで動作することを示す情報を、ネットワーク２を介して相手拠点の情報処理装置１０に送信する。２画面切替モードでの映像切り出し動作の詳細については、図８〜１０で後述する。なお、図７（ｃ）の例では、相手拠点の情報処理装置１０の表示部１１７（表示装置２１５）の表示画面２１５ａにおける表示領域３００が、分割領域３００ａ、３００ｂに２分割され、分割領域３００ａに参加者Ｘが表示され、分割領域３００ｂに自拠点の参加者全体（参加者Ｖ〜Ｚ）が表示されている状態を示している。 Similar to the one-screen fixed mode and the two-screen fixed mode, the information processing apparatus 10 at the local site and the information processing apparatus 10 at the partner site exchange information on the operation mode with each other via the network 2. Here, the information processing device 10 at its own site transmits information indicating that it operates in the two-screen switching mode to the information processing device 10 at the other site via the network 2. Details of the video cut-out operation in the two-screen switching mode will be described later with reference to FIGS. In the example of FIG. 7C, the display area 300 on the display screen 215a of the display unit 117 (display device 215) of the information processing apparatus 10 at the partner site is divided into two divided areas 300a and 300b. Participant X is displayed on the screen, and the entire participant (participants V to Z) at the local site is displayed in the divided area 300b.

以上のステップＳ１１〜Ｓ１７の動作によって、情報処理装置１０によりモード決定処理が実行される。 The mode determination process is executed by the information processing apparatus 10 through the operations in steps S11 to S17 described above.

なお、図６に示すモード決定処理は、会議中において、所定時間ごとに、または、所定の条件を充足した場合に再実行するものとしてもよい。所定の条件を充足した場合とは、例えば、撮像部１１６により撮像されているパノラマ映像において、認識部１０１が顔認識した参加者の位置が移動した場合、前回に認識部１０１により顔認識した利用者がいなくなった場合、または、前回に認識部１０１により顔認識した参加者以外の参加者が顔認識された場合等が挙げられる。 Note that the mode determination process shown in FIG. 6 may be re-executed every predetermined time or when a predetermined condition is satisfied during the conference. For example, when the position of the participant whose face is recognized by the recognition unit 101 is moved in the panoramic image captured by the image pickup unit 116, the use of the face recognized by the recognition unit 101 last time is used. For example, or when a participant other than the participant whose face was previously recognized by the recognition unit 101 is recognized as a face.

また、図６に示すモード決定処理においては、参加者の人数が２人以上である場合は、２画面固定モードまたは２画面切替モードに切り替え、相手拠点の情報処理装置１０の表示制御部１１０は、図７（ｂ）または図７（ｃ）に示すように、表示部１１７の表示画面２１５ａにおける表示領域３００を２分割して、分割領域３００ａおよび分割領域３００ｂを生成するものとしているが、これに限定されるものではない。すなわち、参加者の人数が２人以上である場合でも、表示領域３００を分割せずに１画面として処理するものとしてもよい。 In the mode determination process shown in FIG. 6, when the number of participants is two or more, the display control unit 110 of the information processing apparatus 10 at the partner site is switched to the two-screen fixed mode or the two-screen switching mode. 7 (b) or 7 (c), the display area 300 on the display screen 215a of the display unit 117 is divided into two to generate a divided area 300a and a divided area 300b. It is not limited to. That is, even when the number of participants is two or more, the display area 300 may be processed as one screen without being divided.

（２画面切替モードでの映像切り出し動作）
図８は、実施の形態に係る情報処理装置の２画面切替モードにおける映像切り出し動作の流れの一例を示すフローチャートである。図９は、実施の形態に係る情報処理装置の２画面切替モードにおける画面遷移の例を示す図である。図１０は、ビデオ会議の参加者の全体の映像を切り出す切出範囲の例を説明する図である。図８〜１０を参照しながら、本実施の形態に係る情報処理装置１０の２画面切替モードでの映像切り出し動作の流れについて説明する。なお、以下の説明では、自拠点の情報処理装置１０と、相手拠点の情報処理装置１０との２拠点間でのビデオ会議を想定して説明する。 (Video clipping operation in two-screen switching mode)
FIG. 8 is a flowchart illustrating an example of the flow of a video cut-out operation in the two-screen switching mode of the information processing apparatus according to the embodiment. FIG. 9 is a diagram illustrating an example of screen transition in the two-screen switching mode of the information processing apparatus according to the embodiment. FIG. 10 is a diagram for explaining an example of a cut-out range for cutting out the entire video of the participant in the video conference. With reference to FIGS. 8 to 10, the flow of the video cut-out operation in the two-screen switching mode of the information processing apparatus 10 according to the present embodiment will be described. In the following description, a video conference between two sites, that is, the information processing device 10 at the local site and the information processing device 10 at the other site is assumed.

＜ステップＳ３１＞
上述のように、自拠点の情報処理装置１０および相手拠点の情報処理装置１０は、互いに動作モードの情報を、ネットワーク２を介して交換する。ここでは、自拠点の情報処理装置１０は、２画面切替モードで動作することを示す情報を、ネットワーク２を介して相手拠点の情報処理装置１０に送信する。相手拠点の情報処理装置１０は、自拠点の情報処理装置１０が２画面切替モードで動作していることを認識しているので、相手拠点の情報処理装置１０の表示制御部１１０は、図７（ｃ）に示すように、表示部１１７（表示装置２１５）の表示画面２１５ａにおける表示領域３００を２分割して、分割領域３００ａおよび分割領域３００ｂを生成する。そして、自拠点の情報処理装置１０の入力部１１２は、音声の入力の受け付けを開始する。そして、ステップＳ３２へ移行する。 <Step S31>
As described above, the information processing apparatus 10 at the local site and the information processing apparatus 10 at the partner site exchange information on the operation mode with each other via the network 2. Here, the information processing device 10 at its own site transmits information indicating that it operates in the two-screen switching mode to the information processing device 10 at the other site via the network 2. Since the information processing device 10 at the partner site recognizes that the information processing device 10 at the partner site is operating in the two-screen switching mode, the display control unit 110 of the information processing device 10 at the partner site is shown in FIG. As shown in (c), the display area 300 on the display screen 215a of the display unit 117 (display device 215) is divided into two to generate a divided area 300a and a divided area 300b. Then, the input unit 112 of the information processing apparatus 10 at the local site starts accepting voice input. Then, the process proceeds to step S32.

＜ステップＳ３２＞
入力部１１２により音声が入力された場合（ステップＳ３２：Ｙｅｓ）、ステップＳ３３へ移行し、入力部１１２により音声が入力されない場合、すなわち、相手拠点の参加者が発話している場合（ステップＳ３２：Ｎｏ）、ステップＳ３８へ移行する。ここで、相手拠点が発話している場合とは、例えば、受信部１０８によって、相手拠点の情報処理装置１０から発話者の映像データおよび音声データが受信された場合である。 <Step S32>
When the voice is input by the input unit 112 (step S32: Yes), the process proceeds to step S33, and when the voice is not input by the input unit 112, that is, when the participant at the other party is speaking (step S32: No), the process proceeds to step S38. Here, the case where the partner site is speaking is a case where, for example, the video data and voice data of the speaker are received from the information processing apparatus 10 at the partner site by the receiving unit 108.

＜ステップＳ３３＞
自拠点の情報処理装置１０の方向特定部１０４は、３人以上の参加者のうちいずれかが発話することにより入力部１１２に入力された音声の方向を特定する。そして、ステップＳ３４へ移行する。 <Step S33>
The direction specifying unit 104 of the information processing apparatus 10 at its own site specifies the direction of the voice input to the input unit 112 when one of the three or more participants speaks. Then, control goes to a step S34.

＜ステップＳ３４＞
自拠点の情報処理装置１０の切出部１０３は、配置特定部１０２により特定された配置パターンが示す参加者の方向のうち、方向特定部１０４により特定された音声の方向に最も近い方向の参加者を、発話している参加者（現在の発話者）と判断し、撮像部１１６により撮像されるパノラマ映像からその参加者を含む映像領域（第１映像）を切り出す。また、切出部１０３は、ステップＳ３３で発話している参加者の前に発話していた参加者（前回の発話者）（第２音源）を含む映像領域の切り出しを継続する。そして、ステップＳ３５へ移行する。なお、切出部１０３は、配置特定部１０２により特定された配置パターンが示す参加者の方向のうち、方向特定部１０４により特定された音声の方向に最も近い方向の参加者の映像領域を切り出すものとしているが、これに限定されるものではない。すなわち、切出部１０３は、配置パターンを使用せずに、方向特定部１０４により特定された音声の方向に対応する映像領域をパノラマ画像から直接切り出すものとしてもよい。 <Step S34>
The cut-out unit 103 of the information processing apparatus 10 at the local site participates in the direction closest to the voice direction specified by the direction specifying unit 104 among the participant directions indicated by the arrangement pattern specified by the arrangement specifying unit 102. The participant is determined as a participant who speaks (current speaker), and a video area (first video) including the participant is cut out from the panoramic video captured by the imaging unit 116. Further, the cutout unit 103 continues to cut out the video area including the participant (previous speaker) (second sound source) speaking before the participant speaking in step S33. Then, the process proceeds to step S35. The cutout unit 103 cuts out the video region of the participant in the direction closest to the audio direction specified by the direction specifying unit 104 among the participant directions indicated by the arrangement pattern specified by the arrangement specifying unit 102. However, the present invention is not limited to this. That is, the cutout unit 103 may cut out the video area corresponding to the direction of the audio specified by the direction specifying unit 104 directly from the panoramic image without using the arrangement pattern.

＜ステップＳ３５＞
自拠点の情報処理装置１０の送信部１０７は、切出部１０３により切り出された現在の発話者および前回の発話者それぞれの映像領域の映像データ、および、入力部１１２により入力された現在の発話者の音声データをエンコードして、通信部１１５およびネットワーク２を介して、相手拠点の情報処理装置１０に送信する。また、自拠点の情報処理装置１０の指定部１０６は、切出部１０３によって切り出された現在の発話者および前回の発話者それぞれの映像領域の映像データを、相手拠点の分割領域３００ａ、３００ｂのいずれに表示させるかを指定する指定情報を生成し、送信部１０７は、当該映像データと共に、この指定情報を相手拠点の情報処理装置１０に送信する。そして、ステップＳ３６へ移行する。 <Step S35>
The transmission unit 107 of the information processing apparatus 10 at its own site has the current utterance extracted by the extraction unit 103 and the video data of the video regions of the previous utterers, and the current utterance input by the input unit 112. The voice data of the person is encoded and transmitted to the information processing apparatus 10 at the partner site via the communication unit 115 and the network 2. In addition, the designation unit 106 of the information processing apparatus 10 at the local site uses the video data of the video regions of the current speaker and the previous speaker extracted by the clipping unit 103 in the divided regions 300a and 300b of the partner base. The designation information for designating which display is to be generated is generated, and the transmission unit 107 transmits the designation information together with the video data to the information processing apparatus 10 at the partner site. Then, the process proceeds to step S36.

＜ステップＳ３６＞
相手拠点の情報処理装置１０の受信部１０８は、ネットワーク２および通信部１１５を介して、自拠点の情報処理装置１０から映像データおよび音声データを受信すると、その映像データおよび音声データをデコードする。相手拠点の情報処理装置１０の受信部１０８は、デコードした映像データを表示制御部１１０に送り、デコードした音声データを音声出力制御部１１１に送る。そして、ステップＳ３７へ移行する。 <Step S36>
When receiving the video data and the audio data from the information processing apparatus 10 at the local site via the network 2 and the communication unit 115, the reception unit 108 of the information processing apparatus 10 at the partner site decodes the video data and the audio data. The receiving unit 108 of the information processing apparatus 10 at the partner site sends the decoded video data to the display control unit 110 and sends the decoded audio data to the audio output control unit 111. Then, the process proceeds to step S37.

＜ステップＳ３７＞
相手拠点の情報処理装置１０の表示制御部１１０は、受信した前回の発話者の映像領域の映像データを、受信した指定情報の指定に従って、分割領域３００ａ、３００ｂのうち元々表示していた分割領域に継続して表示させる。また、表示制御部１１０は、受信した指定情報の指定に従って、もう一方の分割領域に、受信した現在の発話者の映像領域の映像データを切り替えて表示させる。また、相手拠点の情報処理装置１０の音声出力制御部１１１は、受信した音声データを音声出力部１１８に音声として出力させる。そして、ステップＳ４２へ移行する。 <Step S37>
The display control unit 110 of the information processing apparatus 10 at the other party bases the divided area that was originally displayed among the divided areas 300a and 300b in accordance with the designation of the received designation information for the received video data of the previous speaker's video area. To display continuously. Also, the display control unit 110 switches and displays the received video data of the current speaker's video area in the other divided area in accordance with the designation of the received designation information. In addition, the audio output control unit 111 of the information processing apparatus 10 at the partner site causes the audio output unit 118 to output the received audio data as audio. Then, the process proceeds to step S42.

＜ステップＳ３８＞
自拠点の情報処理装置１０の切出部１０３は、撮像部１１６により撮像されるパノラマ映像から、配置特定部１０２により特定された配置パターンを用いて自拠点の参加者全体を含む映像領域（第２映像の一例）を切り出す。例えば、図１０（ａ）に示すように、情報処理装置１０が載置された机４０を囲んで、机４０の紙面視上側に２人の参加者が位置し、紙面視下側に１人の参加者が位置している場合を考える。この場合、切出部１０３は、３人の参加者全員を含む範囲であり、かつ、机４０の紙面視上側に位置している２人のうち左側の参加者、および紙面視下側に位置している参加者を端とする最小の範囲である切出範囲４００ａの映像領域を、パノラマ映像から切り出す。 <Step S38>
The cutout unit 103 of the information processing apparatus 10 at its own site uses a video area (first image) including all participants at its own site from the panoramic video captured by the imaging unit 116 using the arrangement pattern specified by the arrangement specifying unit 102. Cut out an example of two images. For example, as shown in FIG. 10A, two participants are located on the upper side of the desk 40 on the paper surface, and one person is on the lower side of the paper surface, surrounding the desk 40 on which the information processing apparatus 10 is placed. Consider the case where the participants are located. In this case, the cutout unit 103 is a range including all three participants, and the left participant of the two persons positioned on the upper side of the desk 40 and the lower side of the page 40 The video area of the cutout range 400a, which is the minimum range starting from the participating participant, is cut out from the panoramic video.

また、図１０（ｂ）に示すように、情報処理装置１０が載置された机４０を囲んで、机４０の紙面視上側に２人の参加者が位置し、紙面視右側に１人の参加者が位置している場合を考える。この場合、切出部１０３は、３人の参加者全員を含む範囲であり、かつ、机４０の紙面視上側に位置している２人のうち左側の参加者、および紙面視右側に位置している参加者を端とする最小の範囲である切出範囲４００ｂの映像領域を、パノラマ映像から切り出す。 Further, as shown in FIG. 10B, two participants are located on the upper side of the desk 40 on the paper 40 side, and one person on the right side of the paper 40 is surrounded by the desk 40 on which the information processing apparatus 10 is placed. Consider the case where a participant is located. In this case, the cutout unit 103 is a range that includes all three participants, and is located on the left side of the two persons located on the upper side of the desk 40 and on the right side of the paper. The video area of the cutout range 400b, which is the minimum range with the participant at the end, is cut out from the panoramic video.

なお、切出部１０３は、パノラマ映像から、配置パターンを用いて自拠点の参加者全体を含む映像領域を切り出すものとしたが、これに限定されるものではない。例えば、切出部１０３は、配置パターンを用いずに、前に発話していた参加者のみを切り出した映像領域よりも広い範囲の映像領域を切り出すものとしてもよく、または、パノラマ映像全体を送信部１０７に送るものとしてもよい。ただし、切出部１０３は、配置パターンを用いた場合、自拠点の参加者の位置を把握できるので、参加者全体を含む映像領域を切り出すことができるという利点がある。 Note that the cutout unit 103 cuts out a video area including all participants at the local site from the panoramic video using the arrangement pattern, but the present invention is not limited to this. For example, the cutout unit 103 may cut out a video area wider than the video area in which only the participant who has spoken before is cut out without using the arrangement pattern, or transmits the entire panoramic video. It may be sent to the unit 107. However, when the arrangement pattern is used, the cutout unit 103 can grasp the positions of the participants at its own base, and thus has an advantage that it can cut out the video area including the entire participants.

また、切出部１０３は、前に発話していた参加者（前回の発話者）（第１音源）を含む映像領域の切り出しを継続する。そして、ステップＳ３９へ移行する。 In addition, the cutout unit 103 continues to cut out the video area including the participant who spoke before (the previous speaker) (first sound source). Then, the process proceeds to step S39.

＜ステップＳ３９＞
自拠点の情報処理装置１０の送信部１０７は、切出部１０３により切り出された参加者全員および前回の発話者それぞれの映像領域の映像データをエンコードして、通信部１１５およびネットワーク２を介して、相手拠点の情報処理装置１０に送信する。また、自拠点の情報処理装置１０の指定部１０６は、切出部１０３によって切り出された参加者全員および前回の発話者それぞれの映像領域の映像データを、相手拠点の分割領域３００ａ、３００ｂのいずれに表示させるかを指定する指定情報を生成し、送信部１０７は、当該映像データと共に、この指定情報を相手拠点の情報処理装置１０に送信する。そして、ステップＳ４０へ移行する。 <Step S39>
The transmission unit 107 of the information processing apparatus 10 at its own site encodes the video data of the video regions of all the participants and the previous utterers cut out by the cut-out unit 103, and transmits the video data via the communication unit 115 and the network 2. The information is transmitted to the information processing apparatus 10 at the partner site. In addition, the designation unit 106 of the information processing apparatus 10 at the local site converts the video data of the video regions of all the participants and the previous utterances cut out by the cut-out unit 103 to any of the divided regions 300a and 300b of the counterpart site. The transmission unit 107 transmits the designation information to the information processing apparatus 10 at the partner site together with the video data. Then, the process proceeds to step S40.

＜ステップＳ４０＞
相手拠点の情報処理装置１０の受信部１０８は、ネットワーク２および通信部１１５を介して、自拠点の情報処理装置１０から映像データを受信すると、その映像データをデコードする。相手拠点の情報処理装置１０の受信部１０８は、デコードした映像データを表示制御部１１０に送る。そして、ステップＳ４１へ移行する。 <Step S40>
When the reception unit 108 of the information processing apparatus 10 at the partner site receives video data from the information processing apparatus 10 at its own site via the network 2 and the communication unit 115, the reception unit 108 decodes the video data. The receiving unit 108 of the information processing apparatus 10 at the partner site sends the decoded video data to the display control unit 110. Then, the process proceeds to step S41.

＜ステップＳ４１＞
相手拠点の情報処理装置１０の表示制御部１１０は、受信した前回の発話者の映像領域の映像データを、受信した指定情報の指定に従って、分割領域３００ａ、３００ｂのうち元々表示していた分割領域に継続して表示させる。また、表示制御部１１０は、受信した指定情報の指定に従って、もう一方の分割領域に、受信した参加者全体の映像領域の映像データを切り替えて表示させる。そして、ステップＳ４２へ移行する。 <Step S41>
The display control unit 110 of the information processing apparatus 10 at the other party bases the divided area that was originally displayed among the divided areas 300a and 300b in accordance with the designation of the received designation information for the received video data of the previous speaker's video area. To display continuously. Further, the display control unit 110 switches and displays the received video data of the entire participant's video area in the other divided area in accordance with the designation of the received designation information. Then, the process proceeds to step S42.

＜ステップＳ４２＞
自拠点の情報処理装置１０は、ビデオ会議が終了したか否かを判定する。例えば、情報処理装置１０は、利用者が操作部１１４から会議終了のための操作を行ったか否かを判定する。ビデオ会議が終了した場合（ステップＳ４２：Ｙｅｓ）、映像切り出し動作を終了し、ビデオ会議が終了していない場合（ステップＳ４２：Ｎｏ）、ステップＳ３２へ戻る。 <Step S42>
The information processing apparatus 10 at its own site determines whether or not the video conference has ended. For example, the information processing apparatus 10 determines whether the user has performed an operation for ending the conference from the operation unit 114. When the video conference is finished (step S42: Yes), the video clipping operation is finished. When the video conference is not finished (step S42: No), the process returns to step S32.

以上のステップＳ３１〜Ｓ４２の動作によって、情報処理装置１０により２画面切替モードでの映像切り出し動作が実行される。 Through the operations in steps S31 to S42 described above, the information processing apparatus 10 executes the video cutout operation in the two-screen switching mode.

なお、上述の２画面切替モードでの動作においては、切出部１０３により切り出された映像領域の映像データを、相手拠点の表示部１１７のどの表示領域（例えば、分割領域３００ａ、３００ｂ）に表示させるかを指定するために、映像データの送信側である自拠点の情報処理装置１０の指定部１０６が指定情報を生成するものとしているが、これに限定されるものではない。例えば、映像データを受信した受信側である相手拠点の情報処理装置１０が、受信した映像データを、表示部１１７のどの表示領域に表示させるのかを決定するものとしてもよい。 In the above-described operation in the two-screen switching mode, the video data of the video area cut out by the cutout unit 103 is displayed in which display area (for example, the divided areas 300a and 300b) of the display unit 117 of the partner base. The designation unit 106 of the information processing apparatus 10 at the local site that is the transmission side of the video data generates the designation information in order to designate whether or not to perform the designation. However, the present invention is not limited to this. For example, the information processing apparatus 10 at the partner site that has received the video data may determine in which display area of the display unit 117 the received video data is to be displayed.

次に、図９を参照しながら、２画面切替モードにおいて、相手拠点の表示部１１７（表示装置２１５）の表示領域３００（以下、図９の説明では単に「表示領域３００」という）における画面遷移の具体例を説明する。 Next, referring to FIG. 9, in the two-screen switching mode, the screen transition in the display area 300 (hereinafter simply referred to as “display area 300” in the description of FIG. 9) of the display unit 117 (display device 215) at the partner site. A specific example will be described.

図９の（１）は、表示領域３００の初期状態の表示例を示している。表示制御部１１０は、例えば、表示領域３００の分割領域３００ａに自拠点の任意の参加者の映像を表示させ、表示領域３００の分割領域３００ｂに自拠点の参加者全体の映像を表示させている。 (1) of FIG. 9 shows a display example of the initial state of the display area 300. For example, the display control unit 110 displays an image of an arbitrary participant at the local site in the divided region 300a of the display region 300, and displays an image of the entire participant at the local site in the divided region 300b of the display region 300. .

図９の（２）は、（１）の状態から自拠点の参加者Ｘが発話した場合の表示領域３００の表示例を示している。表示制御部１１０は、分割領域３００ｂの自拠点の参加者全体の映像の表示はそのままとし、分割領域３００ａで自拠点で発話している参加者Ｘの映像（第１映像の一例）に切り替えて表示させる。 (2) of FIG. 9 shows a display example of the display area 300 when the participant X at the local site speaks from the state of (1). The display control unit 110 switches to the video of the participant X speaking at the local site in the divided region 300a (an example of the first video) while keeping the display of the video of the entire participant at the local site in the divided region 300b. Display.

図９の（３）は、（２）の状態から自拠点の参加者Ｘとは異なる参加者Ｙが発話した場合の表示領域３００の表示例を示している。表示制御部１１０は、分割領域３００ａに前回の発話者である参加者Ｘの映像を継続して表示させ、分割領域３００ｂで自拠点で現在発話している参加者Ｙの映像（第１映像の一例）に切り替えて表示させる。 (3) of FIG. 9 shows a display example of the display area 300 when a participant Y who is different from the participant X at the local site speaks from the state of (2). The display control unit 110 continuously displays the video of the participant X who is the previous speaker in the divided area 300a, and the video of the participant Y who is currently speaking in the local area in the divided area 300b (the first video). Switch to (example) and display.

図９の（４）は、（３）の状態から自拠点の参加者Ｘが再び発話した場合の表示領域３００の表示例を示している。表示制御部１１０は、分割領域３００ｂに前回の発話者である参加者Ｙの映像を継続して表示させ、分割領域３００ａで自拠点で現在発話している参加者Ｘの映像（第１映像の一例）に切り替えて表示させる。ただし、図９の（３）では、分割領域３００ａに元々参加者Ｘの映像が表示されていたので、分割領域３００ａで表示される映像は実質的に変化がないことになる。 (4) of FIG. 9 shows a display example of the display area 300 when the participant X at his / her own site speaks again from the state of (3). The display control unit 110 continuously displays the video of the participant Y who is the previous speaker in the divided area 300b, and the video of the participant X who is currently speaking in the local area in the divided area 300a (the first video). Switch to (example) and display. However, in (3) of FIG. 9, since the video of the participant X was originally displayed in the divided area 300a, the video displayed in the divided area 300a is not substantially changed.

図９の（５）は、（３）の状態から自拠点の参加者Ｘ、Ｙとは異なる参加者Ｚが発話した場合の表示領域３００の表示例を示している。表示制御部１１０は、分割領域３００ｂに前回の発話者である参加者Ｙの映像を継続して表示させ、分割領域３００ａで自拠点で現在発話している参加者Ｚの映像（第１映像の一例）に切り替えて表示させる。 (5) of FIG. 9 shows a display example of the display area 300 when a participant Z different from the participants X and Y at his / her own site speaks from the state of (3). The display control unit 110 continuously displays the video of the participant Y who is the previous speaker in the divided area 300b, and the video of the participant Z who is currently speaking in the local area in the divided area 300a (the first video). Switch to (example) and display.

図９の（６）は、（３）の状態から相手拠点の参加者が発話したことによって自拠点の参加者が発話しなくなった場合等の表示領域３００の表示例を示している。表示制御部１１０は、分割領域３００ｂに前回の発話者である参加者Ｙの映像を継続して表示させ、分割領域３００ａで自拠点の参加者全体の映像（第２映像の一例）に切り替えて表示させる。 (6) of FIG. 9 shows a display example of the display area 300 when the participant at the local site stops speaking due to the participant speaking at the partner site from the state of (3). The display control unit 110 continuously displays the video of the participant Y who is the previous speaker in the divided area 300b, and switches to the video of the entire participant at the local site (an example of the second video) in the divided area 300a. Display.

以上のように、他拠点の参加者が発話している場合に、自拠点の情報処理装置１０の切出部１０３は、パノラマ映像から、自拠点の参加者全体を含む映像領域を切り出し、かつ、前に発話していた参加者（前回の発話者）を含む映像領域の切り出しを継続し、相手拠点の２分割された表示装置２１５の表示領域３００にそれぞれ表示させるものとしている。これによって、相手拠点の参加者が発話している場合等に、相手拠点の参加者は、自拠点の全体の雰囲気を知ることができる。また、相手拠点の参加者が発話している場合等では、２分割された表示領域３００の一方の分割領域に自拠点の前回の発話者を表示させているので、相手拠点の発話者は、自拠点で最後に発話した参加者が自分の発話についてどのような表情で聞いているのかを読み取ることができる。 As described above, when a participant at another site speaks, the cutout unit 103 of the information processing apparatus 10 at the local site cuts out a video area including the entire participant at the local site from the panoramic video, and The video region including the participant who spoke before (the previous speaker) is continuously cut out and displayed on the display region 300 of the display device 215 divided into two at the partner site. Thereby, when a participant at the partner site speaks, the participant at the partner site can know the overall atmosphere of the subject site. In addition, when a participant at the partner site is speaking, the speaker at the partner site is displayed in one divided area of the display area 300 divided into two, so that the speaker at the partner site is You can read what expression the participant who spoke last at his / her site is listening to his / her speech.

また、自拠点の情報処理装置１０の入力部１１２に音声が入力されている場合（自拠点の参加者が発話している場合）、相手拠点の２分割された表示装置２１５の表示領域３００に自拠点の現在の発話者および前回の発話者それぞれを表示させるものとしている。これによって、画面遷移の範囲を最小限に抑えることができる。また、相手拠点の参加者は、自拠点の発話者の分割領域の画像を注視しているため、自拠点の別の参加者が新たに発話しても、注視していない方の分割領域の画像が切り替わるので、画面遷移によって感じるストレスを軽減することができる。 Further, when voice is input to the input unit 112 of the information processing apparatus 10 at the local site (when a participant at the local site is speaking), the display area 300 of the display device 215 divided into two at the partner site is displayed. Each of the current speaker and the previous speaker at his / her base is displayed. As a result, the range of screen transition can be minimized. In addition, because the participant at the other site is watching the image of the divided area of the speaker at his / her own site, even if another participant at his / her own site speaks newly, Since the images are switched, the stress felt by the screen transition can be reduced.

また、自拠点の参加者が１人の場合は、相手拠点の表示装置２１５の表示領域３００は分割させずにその参加者を固定で表示させ、自拠点の参加者が２人の場合は、相手拠点の表示装置２１５の表示領域３００を２分割して、その２人の参加者をそれぞれ固定で表示させるものとしている。この場合、画面遷移は生じないので、画面遷移によるストレスを軽減することができる。 In addition, when the number of participants at the local site is one, the display area 300 of the display device 215 at the other site is displayed without being divided, and when the number of participants at the local site is two, The display area 300 of the display device 215 at the partner site is divided into two parts, and the two participants are fixedly displayed. In this case, since screen transition does not occur, stress due to screen transition can be reduced.

なお、撮像部１１６は、パノラマカメラ２１０によって実現されるものとし、パノラマ画像またはパノラマ映像を撮像するものとしたが、必ずしもこれに限定されるものではない。すなわち、撮像する範囲が３６０度全方向である必要がない等の場合、パノラマカメラを利用する必要はなく、例えば、必要な撮像範囲を網羅する画角を有する撮像装置（カメラ）であってもよい。この場合、撮像装置が撮像可能な画角の範囲で、映像の切り出しを行うものとすればよい。 In addition, although the imaging part 116 shall be implement | achieved by the panorama camera 210 and shall take a panoramic image or a panoramic image, it is not necessarily limited to this. That is, when the imaging range does not need to be 360 degrees in all directions, it is not necessary to use a panoramic camera. For example, even an imaging device (camera) having an angle of view that covers a necessary imaging range can be used. Good. In this case, the video may be cut out within the range of the angle of view that can be captured by the imaging apparatus.

また、自拠点の情報処理装置１０の入力部１１２に音声が入力されない場合（発話者がいない場合）、すなわち、相手拠点の参加者が発話している場合、自拠点の情報処理装置１０の切出部１０３は、パノラマ映像から、自拠点の参加者全体を含む映像領域を切り出すものとしたが、これに限定されるものではない。すなわち、相手拠点の参加者が発話している場合等に、相手拠点の参加者は、自拠点の雰囲気を知るということを満たす範囲で、自拠点の参加者全体ではなく、少なくとも１以上の参加者を含む映像領域を切り出すものとしてもよい。また、切出部１０３は、例えば、特定の参加者のみを切り出した映像領域よりも広い範囲の映像領域を切り出すものとしてもよく、または、パノラマ映像全体を送信部１０７に送るものとしてもよい。 In addition, when no sound is input to the input unit 112 of the information processing apparatus 10 at the local site (when there is no speaker), that is, when a participant at the other site is speaking, the information processing apparatus 10 at the local site is disconnected. Although the output unit 103 cuts out a video area including all participants at the local site from the panoramic video, the present invention is not limited to this. In other words, when a participant at the partner site speaks, the participant at the partner site satisfies at least one participant, not the entire participant at the own site, to the extent that it satisfies that it knows the atmosphere of the own site. The video area including the person may be cut out. Further, for example, the cutout unit 103 may cut out a video area in a wider range than the video area in which only a specific participant is cut out, or may send the entire panoramic video to the transmission unit 107.

（変形例）
本変形例に係る会議システム１の動作について、上述の実施の形態に係る会議システム１の動作と相違する点を中心に説明する。上述の実施の形態では、２拠点間でのビデオ会議を想定した映像切り出し動作を説明したが、本変形例では３拠点以上の間でビデオ会議が行われる場合の動作について説明する。 (Modification)
The operation of the conference system 1 according to this modification will be described focusing on differences from the operation of the conference system 1 according to the above-described embodiment. In the above-described embodiment, the video cut-out operation assuming a video conference between two sites has been described, but in this modification, an operation when a video conference is performed between three sites or more will be described.

図１１は、３拠点以上の間でビデオ会議を行う場合の映像表示の例を示す図である。図１１を参照しながら、３拠点以上の間でビデオ会議を行う場合の映像切り出し動作について説明する。 FIG. 11 is a diagram illustrating an example of video display when a video conference is performed between three or more locations. With reference to FIG. 11, a video cutout operation when a video conference is performed between three or more sites will be described.

まず、各拠点の情報処理装置１０は、互いに動作モードの情報を、ネットワーク２を介して交換する。これによって、各拠点の情報処理装置１０は、互いの動作モードを認識できると共に、参加する拠点数も認識できる。図１１に示す例では、４つの拠点間でビデオ会議が行われている場合の自拠点（拠点Ａ）の表示画面２１５ａの表示状態を示している。４つの拠点間でビデオ会議を行う場合、自拠点の表示画面２１５ａは、図１１に示すように、他拠点である拠点Ｂ〜Ｄそれぞれに対応する表示領域を表示させる。図１１に示すように、表示画面２１５ａに表示させる表示領域として、拠点Ｂに対応するものを表示領域３０１（特定の表示領域）、拠点Ｃに対応するものを表示領域３０２（特定の表示領域）、そして、拠点Ｄに対応するものを表示領域３０３（特定の表示領域）としている。 First, the information processing apparatuses 10 at each site exchange information on operation modes with each other via the network 2. Thereby, the information processing apparatus 10 at each base can recognize the operation mode of each other and can also recognize the number of participating bases. In the example shown in FIG. 11, the display state of the display screen 215a of the own site (site A) when a video conference is held between the four sites is shown. When a video conference is performed between the four bases, the display screen 215a of the local base displays display areas corresponding to the bases B to D which are other bases as shown in FIG. As shown in FIG. 11, as a display area to be displayed on the display screen 215a, a display area 301 (specific display area) corresponding to the base B and a display area 302 (specific display area) corresponding to the base C are displayed. The display area 303 (specific display area) corresponds to the site D.

図１１の例では、自拠点（拠点Ａ）の情報処理装置１０が、拠点Ｂの情報処理装置１０から２画面固定モードで動作することを示す情報を受信し、拠点Ｃの情報処理装置１０から２画面切替モードで動作することを示す情報を受信し、拠点Ｄの情報処理装置１０から２画面切替モードで動作することを示す情報を受信したものとする。そして、自拠点の情報処理装置１０の表示制御部１１０は、表示部１１７（表示装置２１５）の表示画面２１５ａにおいて、表示領域３０１を２分割して分割領域３０１ａ、３０１ｂを生成し、表示領域３０２を２分割して分割領域３０２ａ、３０２ｂを生成し、表示領域３０３を２分割して分割領域３０３ａ、３０３ｂを生成する。３拠点以上の間でビデオ会議が行われる場合でも、各２拠点間における映像切り出し動作は、上述の図８〜１０で説明した動作と同様である。 In the example of FIG. 11, the information processing apparatus 10 at its own base (base A) receives information indicating that it operates in the two-screen fixed mode from the information processing apparatus 10 at the base B, and from the information processing apparatus 10 at the base C. It is assumed that information indicating that it operates in the two-screen switching mode is received, and information indicating that it operates in the two-screen switching mode is received from the information processing apparatus 10 at the site D. Then, the display control unit 110 of the information processing apparatus 10 at the local site generates the divided areas 301a and 301b by dividing the display area 301 into two on the display screen 215a of the display unit 117 (display apparatus 215). Is divided into two to generate divided areas 302a and 302b, and the display area 303 is divided into two to generate divided areas 303a and 303b. Even when a video conference is performed between three or more sites, the video cut-out operation between the two sites is the same as the operation described with reference to FIGS.

図１１の例では、自拠点（拠点Ａ）の参加者が発話している場合の状態を示している。すなわち、拠点Ｂの情報処理装置１０は、２画面固定モードで動作しているので、拠点Ｂの参加者は２人であり、分割領域３０１ａ、３０１ｂには、２人の参加者の映像をそれぞれ固定で表示させている。また、拠点Ｃの情報処理装置１０は、２画面切替モードで動作しているので、拠点Ｃの参加者は３人以上であり、分割領域３０２ａには拠点Ｃの前回の発話者の映像を表示させ、分割領域３０２ｂには拠点Ｃの参加者全体の映像を表示させている。そして、拠点Ｄの情報処理装置１０は、２画面切替モードで動作しているので、拠点Ｄの参加者は３人以上であり、分割領域３０３ａには拠点Ｄの参加者全体の映像を表示させ、表示領域３０３ｂには拠点Ｄの前回の発話者の映像を表示させている。 In the example of FIG. 11, a state in which a participant at the base (base A) is speaking is shown. That is, since the information processing apparatus 10 at the site B is operating in the two-screen fixed mode, there are two participants at the site B, and the divided regions 301a and 301b are images of the two participants, respectively. The display is fixed. Further, since the information processing apparatus 10 at the site C operates in the two-screen switching mode, there are three or more participants at the site C, and the video of the previous speaker at the site C is displayed in the divided area 302a. In the divided area 302b, an image of the entire participant at the base C is displayed. Since the information processing apparatus 10 at the site D operates in the two-screen switching mode, there are three or more participants at the site D, and the divided region 303a displays the video of all the participants at the site D. In the display area 303b, the video of the previous speaker at the site D is displayed.

以上のように、３拠点以上の間でビデオ会議が行われる場合、各２拠点間における映像切り出し動作は、上述の図８〜１０で説明した動作を適用して実現させることができる。これによって、各拠点との間で上述の実施の形態と同様の効果を得ることができる。 As described above, when a video conference is performed between three or more sites, the video cut-out operation between each of the two sites can be realized by applying the operations described above with reference to FIGS. As a result, the same effects as those of the above-described embodiment can be obtained with each base.

なお、上述の実施の形態および変形例において、情報処理装置１０の各機能部の少なくともいずれかがプログラムの実行によって実現される場合、そのプログラムは、ＲＯＭ等に予め組み込まれて提供される。また、上述の実施の形態および変形例に係る情報処理装置１０で実行されるプログラムは、インストール可能な形式または実行可能な形式のファイルでＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、フレキシブルディスク（ＦＤ）、ＣＤ−Ｒ（ＣｏｍｐａｃｔＤｉｓｋ−Ｒｅｃｏｒｄａｂｌｅ）、またはＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）等のコンピュータで読み取り可能な記憶媒体に記憶して提供するように構成してもよい。また、上述の実施の形態および変形例の情報処理装置１０で実行されるプログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成してもよい。また、上述の実施の形態および変形例の情報処理装置１０で実行されるプログラムを、インターネット等のネットワーク経由で提供または配布するように構成してもよい。また、上述の実施の形態および変形例の情報処理装置１０で実行されるプログラムは、上述した各機能部のうち少なくともいずれかを含むモジュール構成となっており、実際のハードウェアとしてはＣＰＵ２０１が上述の記憶装置（例えば、ＲＯＭ２０２および補助記憶装置２０４等）からプログラムを読み出して実行することにより、上述の各機能部が主記憶装置（例えば、ＲＡＭ２０３）上にロードされて生成されるようになっている。 In the above-described embodiment and modification, when at least one of the functional units of the information processing apparatus 10 is realized by executing a program, the program is provided by being incorporated in advance in a ROM or the like. A program executed by the information processing apparatus 10 according to the above-described embodiment and modification is a file in an installable format or an executable format, and is a CD-ROM (Compact Disc Read Only Memory), a flexible disk (FD). ), CD-R (Compact Disk-Recordable), DVD (Digital Versatile Disc), or other computer-readable storage media. In addition, the program executed by the information processing apparatus 10 according to the above-described embodiments and modifications is stored on a computer connected to a network such as the Internet and provided by being downloaded via the network. Also good. Further, the program executed by the information processing apparatus 10 according to the above-described embodiment and modification may be configured to be provided or distributed via a network such as the Internet. In addition, the program executed by the information processing apparatus 10 according to the above-described embodiment and the modification has a module configuration including at least one of the above-described functional units. By reading the program from the storage device (for example, the ROM 202 and the auxiliary storage device 204) and executing the program, the above-described functional units are loaded on the main storage device (for example, the RAM 203) and generated. Yes.

１会議システム
２ネットワーク
１０、１０ａ、１０ｂ情報処理装置
２０会議サーバ
４０机
５０ホワイトボード
６０ａ〜６０ｅ参加者
６１ａ〜６１ｄ参加者
１０１認識部
１０２配置特定部
１０３切出部
１０４方向特定部
１０５管理部
１０６指定部
１０７送信部
１０８受信部
１０９撮像制御部
１１０表示制御部
１１１音声出力制御部
１１２入力部
１１３記憶部
１１４操作部
１１５通信部
１１６撮像部
１１７表示部
１１８音声出力部
２０１ＣＰＵ
２０２ＲＯＭ
２０３ＲＡＭ
２０４補助記憶装置
２０５メディアドライブ
２０５ａ記録メディア
２０６操作ボタン
２０７電源スイッチ
２０８ネットワークＩ／Ｆ
２０９撮像素子Ｉ／Ｆ
２１０パノラマカメラ
２１１音声Ｉ／Ｆ
２１２マイクアレイ
２１２ａ〜２１２ｆマイク
２１３スピーカ
２１４出力Ｉ／Ｆ
２１４ａケーブル
２１５表示装置
２１５ａ表示画面
２１６外部機器Ｉ／Ｆ
２１７バス
３００〜３０３表示領域
３００ａ、３００ｂ分割領域
３０１ａ、３０１ｂ分割領域
３０２ａ、３０２ｂ分割領域
３０３ａ、３０３ｂ分割領域
４００ａ、４００ｂ切出範囲
Ｐ１〜Ｐ７領域 DESCRIPTION OF SYMBOLS 1 Conference system 2 Network 10, 10a, 10b Information processing apparatus 20 Conference server 40 Desk 50 White board 60a-60e Participant 61a-61d Participant 101 Recognition part 102 Arrangement | positioning specification part 103 Cutout part 104 Direction specification part 105 Management part 106 Designation unit 107 Transmission unit 108 Reception unit 109 Imaging control unit 110 Display control unit 111 Audio output control unit 112 Input unit 113 Storage unit 114 Operation unit 115 Communication unit 116 Imaging unit 117 Display unit 118 Audio output unit 201 CPU
202 ROM
203 RAM
204 Auxiliary storage device 205 Media drive 205a Recording medium 206 Operation button 207 Power switch 208 Network I / F
209 Image sensor I / F
210 Panorama Camera 211 Audio I / F
212 Microphone array 212a to 212f Microphone 213 Speaker 214 Output I / F
214a Cable 215 Display device 215a Display screen 216 External device I / F
217 Bus 300-303 Display area 300a, 300b Divided area 301a, 301b Divided area 302a, 302b Divided area 303a, 303b Divided area 400a, 400b Cutout area P1-P7 area

特開２０１０−０８１６４４号公報JP 2010-081644 A

Claims

An information processing device that communicates video and sound with other information processing devices,
An input unit for receiving sound source sound input;
An imaging unit that captures an image of the sound source;
A receiver for receiving video and sound from the other information processing apparatus;
When sound is input from the input unit, a video region including a sound source that emits the sound is cut out from the video captured by the imaging unit to be a first video, and the other information processing apparatus is configured by the reception unit. When a sound is received from, a cutout unit that cuts out a video area in a range wider than at least the first video from the video taken by the imaging unit,
A transmission unit that transmits at least one of the first video and the second video to the other information processing apparatus;
An information processing apparatus comprising:

A designation unit for generating designation information for designating which display area in each of the other information processing apparatuses displays the first video and the second video cut out by the cutout unit;
The information processing apparatus according to claim 1, wherein the transmission unit also transmits the designation information when transmitting at least one of the first video and the second video.

A first specifying unit that specifies a direction of sound input from the input unit;
When the sound is received from the other information processing device by the receiving unit, the cutting unit receives the sound from the other information processing device specified by the first specifying unit, by the receiving unit. A video region including the first sound source corresponding to the direction of the sound input by the input unit immediately before
The designation unit displays the video region including the first sound source cut out by the cut-out unit in a display region other than the display region where the second video is displayed in the other information processing apparatus. Generate specification information to specify,
The information processing apparatus according to claim 2, wherein the transmission unit transmits a video region including the first sound source to the other information processing apparatus together with the designation information.

A recognition unit for recognizing a sound source from the video imaged by the imaging unit;
A second specifying unit that specifies an arrangement pattern indicating in which direction the sound source recognized by the recognition unit is arranged with respect to the information processing device in the video imaged by the imaging unit;
When the sound is received from the other information processing apparatus by the receiving unit, the clipping unit extracts a video region including all sound sources whose arrangement direction is specified by the arrangement pattern from the video. The information processing apparatus according to claim 1, which is cut out as an image.

When the sound is input from the input unit,
A video region including a sound source in the direction of the sound input to the input unit specified by the first specifying unit is cut out from the video,
A video region including a second sound source that emits sound before the sound source is cut out from the video,
The designation unit generates designation information that designates that the video region including the second sound source is continuously displayed and the video region including the sound source in the direction of the sound input to the input unit is displayed. ,
The transmission unit transmits the video area including the second sound source, the video area including the sound source in the direction of the sound input to the input unit, and the designation information to the other information processing apparatus. The information processing apparatus described.

The number of sound sources recognized by the recognition unit is obtained. When the number of sound sources is 1, the information processing apparatus is switched to the first operation mode. When the number of sound sources is 2, the information processing apparatus is When the mode is switched to the operation mode and the number of sound sources is 3 or more, a switching unit for switching to the third mode of operation is further provided,
In the first operation mode,
The cutout unit cuts out a video area including one sound source from the video,
The transmission unit transmits the video area to the other information processing apparatus;
In the second operation mode,
The cutout unit cuts out a video area including two sound sources from the video,
The information processing apparatus according to claim 4, wherein the transmission unit transmits a video area including each of the two sound sources cut out by the cutout unit to the other information processing apparatus.

Every predetermined time
The recognizing unit recognizes a sound source from the video imaged by the imaging unit;
The switching unit obtains the number of sound sources recognized by the recognition unit, and based on the number of sound sources, the first operation mode, the second operation mode, or other than the first operation mode and the second operation mode The information processing apparatus according to claim 6, wherein the information processing apparatus is switched to an operation mode.

The display control part which displays the 1st picture or the 2nd picture on a display part when the 1st picture or the 2nd picture is received from the other information processor by the receiving part. The information processing apparatus according to any one of 1 to 7.

Information processing apparatus according to any one of claims 1 to 8,
A server device that performs communication control between the information processing device and the other information processing device;
Conference system.

An information processing method in an information processing apparatus that communicates video and sound with another information processing apparatus,
An input step for receiving sound input from the sound source;
An imaging step of imaging the image of the sound source;
A receiving step of receiving video and sound from the other information processing apparatus;
When sound is input, a video area including a sound source that has emitted the sound is cut out from the captured video as the first video, and when sound is received from the other information processing apparatus, from the captured video A step of cutting out a video area in a wider range than at least the first video to form a second video;
A transmission step of transmitting at least one of the first video and the second video to the other information processing apparatus;
An information processing method comprising:

To computers that communicate video and audio with other information processing devices,
A receiving step of receiving video and sound from the other information processing apparatus;
When sound is input from an input unit that receives input of sound from a sound source, a video region including the sound source that has emitted the sound is cut out from the image captured by the image capturing unit that captures the image of the sound source. And, in the case of receiving sound from the other information processing apparatus, a step of cutting out a video area in a range wider than at least the first video from the video captured by the imaging unit to be a second video When,
A transmission step of transmitting at least one of the first video and the second video to the other information processing apparatus;
A program for running