JP4957221B2

JP4957221B2 - Communication device

Info

Publication number: JP4957221B2
Application number: JP2006327386A
Authority: JP
Inventors: 紀行畑
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2006-12-04
Filing date: 2006-12-04
Publication date: 2012-06-20
Anticipated expiration: 2026-12-04
Also published as: JP2008141611A

Description

本発明は、映像データを送受信する技術に関する。 The present invention relates to a technique for transmitting and receiving video data.

近年、ネットワーク経由で映像データや音声データの通信を行うことにより、遠隔地において会議を行うことができる会議システムがある。一般に映像データは、音声データに比べて情報量が多く、また、近年の映像撮影用のカメラの高性能化に伴う画像品質の向上により、映像データの情報量がさらに大きくなっていることから、通信時においてネットワークへの負荷が大きくなりやすい。この状況において、映像と音声を同期させて通信を行おうとすると、映像データの情報量の多さによるネットワークの負荷の増大により、音声が途切れてしまうことがあり、会議を行う際の問題となっていた。そのため、画像データの送信量を減らすために、特許文献１に開示されている技術を利用して、映像に変化があったときのみ映像データを送信することにより、ネットワークの負荷の低減を行うことも考えられている。
特開平１１−２２５３２６号公報 2. Description of the Related Art In recent years, there are conference systems that can conduct conferences at remote locations by communicating video data and audio data via a network. In general, video data has a larger amount of information than audio data, and the amount of information in video data has further increased due to the improvement in image quality associated with the recent improvement in performance of video cameras. The load on the network is likely to increase during communication. In this situation, if communication is attempted with video and audio synchronized, the audio may be interrupted due to an increase in the network load due to the large amount of information in the video data, which is a problem when conducting a conference. It was. Therefore, in order to reduce the transmission amount of the image data, the network load is reduced by transmitting the video data only when there is a change in the video using the technology disclosed in Patent Document 1. Is also considered.
JP-A-11-225326

しかし、特許文献１のように映像の変化の有無についての判断は、会議システムの制御部であるＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）などの負荷が非常に大きいものであり、データ処理に時間がかかってしまう。そのため、この技術を会議システムに応用した場合、映像データの送信が音声データの送信よりも遅れること、また、頻繁に映像の変化が起こる場合には、映像データを多く送信することになり、音声が途切れてしまうこともあった。さらにＣＰＵの高性能化を行う必要があり、非常にコストがかかるものであった。 However, as in Japanese Patent Application Laid-Open No. H10-260260, the determination as to whether or not there is a change in video is very heavy on the CPU (Central Processing Unit) that is a control unit of the conference system, and data processing takes time. . Therefore, when this technology is applied to a conference system, the transmission of video data will be delayed from the transmission of audio data, and if video changes frequently, a large amount of video data will be transmitted. Sometimes broke down. Furthermore, it is necessary to increase the performance of the CPU, which is very expensive.

本発明は、上述の事情に鑑みてなされたものであり、画像データの送受信を行っても音声が途切れにくい送信装置、受信装置および通信装置を提供することを目的とする。 The present invention has been made in view of the above-described circumstances, and an object of the present invention is to provide a transmission device, a reception device, and a communication device in which sound is hardly interrupted even when image data is transmitted and received.

上述の課題を解決するため、本発明は、通信網に接続して他の通信装置と通信を行う通信装置であって、撮影範囲内の第１のエリアの映像を第１の映像データとして生成する撮影手段と、前記第１のエリアの一部分の範囲を第２のエリアとして複数設定するエリア設定手段と、前記複数の第２のエリアを組み合わせて１以上の第３のエリアを生成するエリア生成手段と、音源からの音を収音して音声データを生成する収音手段と、前記収音手段によって生成された音声データに基づいて、前記音源の方向を特定する音源方向特定手段と、複数設定された第２のエリアから、前記特定された音源の方向に対応する位置を含む第２のエリアを選定する選定手段と、前記通信網を介して前記他の通信装置と通信を行うときに所定の確率以上で送信が成功する送信レートを所定の時間ごとに順次測定する使用可能帯域幅測定手段と、前記送信レートに基づいて、第１のエリアと、前記選定手段によって選定された第２のエリアと、前記選定手段によって選定された第２のエリアを含む第３のエリアとのいずれかから選択する選択手段と、前記第２のエリアの映像を前記第１の映像データから切り出して第２の映像データを生成し、前記第３のエリアの映像を前記第１の映像データから切り出して第３の映像データを生成する映像データ切り出し手段と、前記選択手段によって選択されたエリアを示す情報と当該エリアに対応する映像データとを前記他の通信装置へ送信する送信手段とを具備することを特徴とする通信装置を提供する。 In order to solve the above-described problems, the present invention is a communication device that connects to a communication network and communicates with another communication device, and generates a first area image within the shooting range as first image data. Imaging means for performing, area setting means for setting a plurality of partial areas of the first area as second areas, and area generation for generating one or more third areas by combining the plurality of second areas Means, sound collection means for collecting sound from the sound source to generate sound data, sound source direction specifying means for specifying the direction of the sound source based on the sound data generated by the sound collection means, When communicating with the other communication device via the communication network, and a selection means for selecting a second area including a position corresponding to the direction of the specified sound source from the set second area. Transmission with a certain probability or more Usable bandwidth measuring means for sequentially measuring effective transmission rates every predetermined time; a first area based on the transmission rate; a second area selected by the selecting means; and the selecting means Selecting means for selecting from any one of the third areas including the second area selected by the step of generating the second video data by cutting out the video of the second area from the first video data , A video data cutout unit for cutting out the video of the third area from the first video data to generate third video data, information indicating the area selected by the selection unit, and a video corresponding to the area There is provided a communication device comprising a transmission means for transmitting data to the other communication device.

また、別の好ましい態様において、前記収音手段は、複数のマイクロフォンを有し、前記音源方向特定手段は、前記複数のマイクロフォンが音源からの音を収音することによって生成された各音声データに基づいて、前記音源の方向を特定してもよい。 In another preferred embodiment, the sound collecting means has a plurality of microphones, and the sound source direction specifying means adds each sound data generated when the plurality of microphones pick up sounds from the sound sources. Based on this, the direction of the sound source may be specified.

また、別の好ましい態様において、前記映像切り出し手段は、前記送信レートに基づいて、データ量を低減するように圧縮した第３の映像データを生成してもよい。 In another preferable aspect, the video cutout unit may generate third video data compressed so as to reduce a data amount based on the transmission rate.

また、別の好ましい態様において、前記映像切り出し手段は、前記送信レートに基づいて、データ量を低減するように圧縮した第２の映像データを生成してもよい。 In another preferable aspect, the video cutout unit may generate second video data compressed so as to reduce a data amount based on the transmission rate.

また、別の好ましい態様において、前記他の通信装置からエリアを示す情報と当該エリアに対応する映像データとを受信する受信手段と、前記受信手段が前記エリアを示す情報と前記映像データを受信したときに、その直前の映像データを再生して得られる映像の前記受信したエリアに対応する範囲に対して、前記受信した映像データを再生して得られる映像が上書きされて表示されるようにして、第４の映像データを生成する映像データ生成手段とをさらに具備してもよい。 In another preferred embodiment, receiving means for receiving information indicating an area and video data corresponding to the area from the other communication device, and receiving means receives the information indicating the area and the video data. Sometimes, the video obtained by reproducing the received video data is overwritten and displayed on the range corresponding to the received area of the video obtained by reproducing the video data immediately before. And a video data generating means for generating the fourth video data.

本発明によれば、画像データの送受信を行っても音声が途切れにくい通信装置を提供することができる。 According to the present invention, it is possible to provide a communication device in which sound is hardly interrupted even when image data is transmitted and received.

以下、本発明の一実施形態について説明する。 Hereinafter, an embodiment of the present invention will be described.

＜実施形態＞
図１は、本発明の本実施形態に係る通信装置を含む通信システム１の構成を示すブロック図である。通信システム１は、通信装置１００ａと通信装置１００ｂと通信網１０とを有し、通信装置１００ａおよび通信装置１００ｂは、通信網１０に有線接続または無線接続されている。通信装置１００ａおよび通信装置１００ｂは同じ構成であって、以下、通信装置１００ａおよび通信装置１００ｂを区別する必要が無いときには、両者を通信装置１００という。なお、ここでは２台の通信装置１００が通信網１０に接続されているが、３台以上の通信装置１００が接続されていてもよい。 <Embodiment>
FIG. 1 is a block diagram showing a configuration of a communication system 1 including a communication apparatus according to this embodiment of the present invention. The communication system 1 includes a communication device 100a, a communication device 100b, and a communication network 10. The communication device 100a and the communication device 100b are connected to the communication network 10 by wire or wirelessly. The communication device 100a and the communication device 100b have the same configuration. Hereinafter, when there is no need to distinguish between the communication device 100a and the communication device 100b, both are referred to as the communication device 100. Although two communication devices 100 are connected to the communication network 10 here, three or more communication devices 100 may be connected.

本実施形態では、通信プロトコルとして以下に述べる各通信プロトコルが用いられている。アプリケーション層の通信プロトコルとして、音声データおよび映像データの転送にはＲＴＰ（Ｒｅａｌ−ｔｉｍｅＴｒａｎｓｐｏｒｔＰｒｏｔｏｃｏｌ）が用いられている。ＲＴＰとは、音声データや映像データをｅｎｄ−ｔｏ−ｅｎｄでリアルタイムに送受信する通信サービスを提供するための通信プロトコルであり、その詳細はＲＦＣ１８８９に規定されている。ＲＴＰにおいては、ＲＴＰパケットを生成し送受信することにより通信装置同士でデータの授受が行われる。また、トランスポート層の通信プロトコルとしては、ＵＤＰ（ＵｓｅｒＤａｔａｇｒａｍＰｒｏｔｏｃｏｌ）が用いられており、ネットワーク層の通信プロトコルとしてはＩＰ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）が用いられている。上記の通信装置１００ａおよび通信装置１００ｂには、それぞれにＩＰアドレスが割り振られており、ネットワーク上で一元的に識別される。なお、ＵＤＰおよびＩＰについては、一般に広く用いられている通信プロトコルであるため説明を省略する。 In this embodiment, each communication protocol described below is used as a communication protocol. As a communication protocol for the application layer, RTP (Real-time Transport Protocol) is used for transferring audio data and video data. RTP is a communication protocol for providing a communication service for transmitting and receiving audio data and video data in end-to-end in real time, and details thereof are defined in RFC1889. In RTP, data is exchanged between communication devices by generating and transmitting / receiving RTP packets. Further, UDP (User Datagram Protocol) is used as a transport layer communication protocol, and IP (Internet Protocol) is used as a network layer communication protocol. The communication device 100a and the communication device 100b are assigned IP addresses, respectively, and are uniquely identified on the network. In addition, about UDP and IP, since it is a communication protocol generally used widely, description is abbreviate | omitted.

次に、通信装置１００の構成について説明する。図２は、通信装置１００の構成を示すブロック図である。なお、以下の説明において、通信装置１００の構成が通信装置１００ａまたは通信装置１００ｂのいずれかに属するものであるかを区別する必要があるときには、通信装置１００ａのＣＰＵ１０１をＣＰＵ１０１ａのようにアルファベットを付して記載する。 Next, the configuration of the communication apparatus 100 will be described. FIG. 2 is a block diagram illustrating a configuration of the communication device 100. In the following description, when it is necessary to distinguish whether the configuration of the communication device 100 belongs to either the communication device 100a or the communication device 100b, the CPU 101 of the communication device 100a is given an alphabet like the CPU 101a. And describe.

ＣＰＵ１０１は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１０２に記憶されているプログラムを読み出して、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１０３にロードして実行することにより、通信装置１００の各部について、バス１１０を介して制御する。ＲＡＭ１０３は、さらに音声入力部１０４から出力された音声データ、映像入力部１０５から出力された映像データ、通信網１０を介して受信した音声データ、映像データなどを記憶する。また、ＣＰＵ１０１が記憶された各データの加工などを行う際のワークエリアとして機能する。 The CPU 101 reads out a program stored in a ROM (Read Only Memory) 102, loads it into a RAM (Random Access Memory) 103, and executes it to control each unit of the communication device 100 via the bus 110. . The RAM 103 further stores audio data output from the audio input unit 104, video data output from the video input unit 105, audio data received via the communication network 10, video data, and the like. Further, the CPU 101 functions as a work area when processing the stored data.

音声入力部１０４は、マイクアレイとＡ／Ｄコンバータを有する。Ａ／Ｄコンバータは、マイクアレイから入力されたアナログ信号の音声信号をアナログデジタル変換してデジタル信号の音声データを生成する。マイクアレイは、水平方向に並んだ複数のマイクロフォンを有している。ＣＰＵ１０１は、ある音源からの放音を複数のマイクロフォンで収音することによって、それぞれ生成された音声データを解析し、各マイクロフォンへの音の到達時間のずれを計算して、音源の位置について通信装置１００から見た場合の水平方向の角度として特定することができる。 The audio input unit 104 includes a microphone array and an A / D converter. The A / D converter performs analog-to-digital conversion on an analog audio signal input from the microphone array to generate digital signal audio data. The microphone array has a plurality of microphones arranged in the horizontal direction. The CPU 101 collects sound emitted from a certain sound source with a plurality of microphones, analyzes each generated sound data, calculates a difference in arrival time of the sound to each microphone, and communicates the position of the sound source. It can be specified as an angle in the horizontal direction when viewed from the device 100.

例えば、図３に示すように、特定の音源Ｓからの放音を４台のマイクロフォン１０４１、１０４２、１０４３、１０４４によって収音した場合、音源Ｓから各マイクロフォン１０４１、１０４２、１０４３、１０４４までの距離の違いから、音の到達時間がずれることになる。例えば、音源Ｓからマイクロフォン１０４１までの距離はマイクロフォン１０４４までの距離よりｄだけ長いので、マイクロフォン１０４１は、音源Ｓから放出された音をマイクロフォン１０４４よりｄ／ｖ（ｖ：音速）だけ遅く収音する。すなわち、ＣＰＵ１０１は、音声入力部１０４から出力された音声データを解析し、この時間のずれを計算することにより、音源Ｓから各マイクロフォン１０４１、１０４２、１０４３、１０４４までの距離の違いをそれぞれ算出できることになり、音源Ｓの位置について水平方向の角度θを特定することができる。ここで、θは、水平方向に並んだ複数のマイクロフォンと垂直な方向Ｍ（本実施形態においては、通信装置１００の正面方向）を０度として規定されている。なお、音源Ｓの方向だけでなく、距離についても算出することは可能であるが、本実施形態においては、方向のみ特定するものとする。また、複数の音源から放音されていても、それぞれの音源の方向を特定することが可能であるが、本実施形態においては、最も音量の大きい方向を音源の方向として特定する。 For example, as shown in FIG. 3, when sound emitted from a specific sound source S is collected by four microphones 1041, 1042, 1043, and 1044, the distance from the sound source S to each of the microphones 1041, 1042, 1043, and 1044 Due to the difference, the arrival time of the sound will shift. For example, since the distance from the sound source S to the microphone 1041 is longer than the distance from the microphone 1044 by d, the microphone 1041 collects the sound emitted from the sound source S slower than the microphone 1044 by d / v (v: sound speed). . That is, the CPU 101 can calculate the difference in distance from the sound source S to each of the microphones 1041, 1042, 1043, and 1044 by analyzing the audio data output from the audio input unit 104 and calculating the time lag. Thus, the angle θ in the horizontal direction can be specified for the position of the sound source S. Here, θ is defined with a direction M (in the present embodiment, the front direction of the communication device 100) perpendicular to a plurality of microphones arranged in the horizontal direction as 0 degrees. It is possible to calculate not only the direction of the sound source S but also the distance, but in this embodiment, only the direction is specified. In addition, even if sound is emitted from a plurality of sound sources, the direction of each sound source can be specified, but in the present embodiment, the direction with the highest volume is specified as the direction of the sound source.

映像入力部１０５は、ＣＣＤやＣＭＯＳなどのイメージセンサを有し、イメージセンサによって所定の画像サイズ（ピクセル数）および単位時間あたりのフレーム数で撮影し、映像データを生成する。本実施形態においては、映像入力部１０５のイメージセンサと音声入力部１０４のマイクロフォンは、通信装置１００に固定されている。このように固定することにより、イメージセンサの撮影範囲に存在する音源とＣＰＵ１０１が特定する音源の方向との位置関係が保たれることになる。例えば、図４に示すように、水平方向の角度θはイメージセンサの撮影範囲の中心方向Ｍを０度し、図中の右側については正の値、左側は負の値とする。ここで、図４は、イメージセンサの撮影範囲について、水平に並んだ複数のマイクロフォンを含む面によって切ったときの断面図であり、図中の右側は以下に示す表示部１０７に映像を表示したときには、画面の右側に表示される。 The video input unit 105 includes an image sensor such as a CCD or a CMOS, and shoots with a predetermined image size (number of pixels) and the number of frames per unit time by the image sensor to generate video data. In the present embodiment, the image sensor of the video input unit 105 and the microphone of the audio input unit 104 are fixed to the communication device 100. By fixing in this way, the positional relationship between the sound source existing in the imaging range of the image sensor and the direction of the sound source specified by the CPU 101 is maintained. For example, as shown in FIG. 4, the angle θ in the horizontal direction is 0 degree in the center direction M of the imaging range of the image sensor, and the right side in the figure is a positive value and the left side is a negative value. Here, FIG. 4 is a cross-sectional view of the imaging range of the image sensor when cut by a plane including a plurality of microphones arranged horizontally, and the right side in the figure displays an image on the display unit 107 shown below. Sometimes it appears on the right side of the screen.

操作部１０６は、例えばキーボードやマウスなどであり、通信装置１００の操作者が操作部１０６を操作すると、その操作内容を表すデータがＣＰＵ１０１へ出力される。 The operation unit 106 is, for example, a keyboard or a mouse. When an operator of the communication apparatus 100 operates the operation unit 106, data representing the operation content is output to the CPU 101.

表示部１０７は、映像を画面に表示する液晶ディスプレイなどの表示デバイスであって、入力された映像データに基づいて表示を行う。 The display unit 107 is a display device such as a liquid crystal display that displays an image on a screen, and performs display based on input image data.

音声出力部１０８は、入力された音声データを放音するものであって、スピーカとＤ／Ａコンバータを有している。Ｄ／Ａコンバータは、入力されたデジタル信号の音声データをデジタルアナログ変換してアナログ信号の音声信号を生成しスピーカへ出力する。スピーカは、入力された音声信号を放音する。 The audio output unit 108 emits input audio data, and includes a speaker and a D / A converter. The D / A converter performs digital / analog conversion on the audio data of the input digital signal to generate an audio signal of the analog signal and outputs it to the speaker. The speaker emits the input audio signal.

通信ＩＦ（インタフェイス）１０９は、例えば、ＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ）であり、通信網１０に接続されている。この通信ＩＦ１０９は、送信すべき音声データ及び映像データなどの各種データについてのＲＴＰパケットを生成し、下位層の通信プロトコルにしたがって順次カプセル化することにより得られるＩＰパケットを通信網１０へ送信する。なお、カプセル化とは、上記ＲＴＰパケットをペイロード部に書き込んだＵＤＰパケットを生成し、さらに、そのＵＤＰパケットをペイロード部に書き込んだＩＰパケットを生成することである。また、通信ＩＦ１０９は、通信網１０からＩＰパケットを受信し、上記カプセル化とは逆の処理を行うことにより、そのＩＰパケットにカプセル化されているＲＴＰパケットを読み出したデータをＣＰＵ１０１に出力する。 The communication IF (interface) 109 is, for example, a NIC (Network Interface Card) and is connected to the communication network 10. The communication IF 109 generates RTP packets for various data such as audio data and video data to be transmitted, and transmits IP packets obtained by sequentially encapsulating according to a lower layer communication protocol to the communication network 10. Encapsulation means generating a UDP packet in which the RTP packet is written in the payload part, and further generating an IP packet in which the UDP packet is written in the payload part. Further, the communication IF 109 receives an IP packet from the communication network 10 and performs a process reverse to the above encapsulation, thereby outputting data obtained by reading the RTP packet encapsulated in the IP packet to the CPU 101.

ここで、音声データ及び映像データのＲＴＰパケット化について説明する。ＲＴＰパケットは、ＩＰにおけるデータ転送単位であるパケットと同様に、ペイロード部に対してヘッダ部が付与され構成されている。 Here, RTP packetization of audio data and video data will be described. The RTP packet is configured by adding a header portion to the payload portion, similarly to a packet that is a data transfer unit in IP.

ヘッダ部には、タイムスタンプ、ペイロードタイプ、シーケンス番号、およびエリア情報の４種類のデータがセットされる。ここで、タイムスタンプとは、時刻（通信の開始を指示がなされてから経過した時間）を示すデータである。ペイロードタイプとは、通信メッセージの種別をその通信メッセージの宛先に識別させるためのデータである。本実施形態で利用されるメッセージ種別には、映像データ送信メッセージ、音声データ送信メッセージ、テストデータ送信メッセージ、受信通知メッセージの４種類がある。そして、これらのメッセージに対して、ペイロードタイプには、それぞれ「１」、「２」、「３」、「４」の４種類の数字が書き込まれる。シーケンス番号とは、各パケットを一意に識別するための識別子であり、例えば１つの音声データが複数のＲＴＰパケットに分割されて送信される際に、各パケットに対して１、２、３、・・・のようにシーケンス番号が付与される。エリア情報とは、映像データ送信メッセージの際に、当該ＲＴＰパケットに含まれる映像データがどのエリアに該当する映像であるかを規定する情報であり、そのエリアを示す座標の情報が書き込まれる。エリアについては、パーソナルエリア、アクティブエリア、オールエリアがあるが、その詳細は後述する。 In the header part, four types of data including time stamp, payload type, sequence number, and area information are set. Here, the time stamp is data indicating the time (the time elapsed since the start of communication was instructed). The payload type is data for identifying the type of communication message to the destination of the communication message. There are four types of messages used in this embodiment: a video data transmission message, an audio data transmission message, a test data transmission message, and a reception notification message. For these messages, four types of numbers “1”, “2”, “3”, and “4” are written in the payload type, respectively. The sequence number is an identifier for uniquely identifying each packet. For example, when one voice data is divided into a plurality of RTP packets and transmitted, 1, 2, 3,.・ Sequence numbers are given as shown below. The area information is information that defines which area the video data included in the RTP packet corresponds to in the video data transmission message, and information on coordinates indicating the area is written. The area includes a personal area, an active area, and an all area, details of which will be described later.

ペイロード部には、例えば映像データ送信メッセージ、音声データ送信メッセージにおいては、それぞれ所定時間分の映像データ、音声データが書き込まれる。また、テストデータ送信メッセージにおいてはテストデータ、受信通知メッセージにおいては、受取ったパケットのシーケンス番号が書き込まれる。テストデータ送信メッセージ、受信通知メッセージについての説明は後述する。 For example, in the video data transmission message and the audio data transmission message, video data and audio data for a predetermined time are written in the payload portion, respectively. Further, test data is written in the test data transmission message, and in the reception notification message, the sequence number of the received packet is written. The test data transmission message and the reception notification message will be described later.

次に、通信システム１を用いて遠隔会議を行う際に、通信装置１００のＣＰＵ１０１が、ＲＯＭ１０２に記憶されているプログラムを実行することにより実現する動作について説明する。 Next, an operation realized when the CPU 101 of the communication apparatus 100 executes a program stored in the ROM 102 when performing a remote conference using the communication system 1 will be described.

ここで、本実施形態においては、部屋ａと部屋ｂとの間において遠隔会議が行われ、部屋ａには通信装置１００ａが設置され、５名の参加者２０１、２０２、・・・、２０５が会議に参加しており、部屋ｂには通信装置１００ｂが設置され、３名の参加者３０１、３０２、３０３が会議に参加しているものとする。そして、それぞれの通信装置１００は、映像入力部１０５のイメージセンサによって机２００、３００付近に座っている参加者を撮影するように設置されている。ここで、図５は、通信装置１００ａの映像入力部１０５ａの撮影対象を示している図であり、この撮影範囲の映像データが通信装置１００ｂに送信されることにより、通信装置１００ｂの表示部１０７ｂに表示される映像となる。図６は通信装置１００ｂの映像入力部１０５ｂの撮影対象を示している図であり、この撮影範囲の映像データが通信装置１００ａに送信されることにより、通信装置１００ａの表示部１０７ａに表示される映像となる。 Here, in the present embodiment, a remote conference is performed between the room a and the room b, the communication apparatus 100a is installed in the room a, and five participants 201, 202,. It is assumed that the communication device 100b is installed in the room b, and three participants 301, 302, and 303 are participating in the conference. Each communication device 100 is installed so as to photograph a participant sitting near the desks 200 and 300 by the image sensor of the video input unit 105. Here, FIG. 5 is a diagram illustrating a shooting target of the video input unit 105a of the communication device 100a. When video data in this shooting range is transmitted to the communication device 100b, the display unit 107b of the communication device 100b. Will be displayed on the screen. FIG. 6 is a diagram illustrating a shooting target of the video input unit 105b of the communication device 100b. When video data in this shooting range is transmitted to the communication device 100a, it is displayed on the display unit 107a of the communication device 100a. It becomes a picture.

まず、遠隔会議の開始前にパーソナルエリアの設定が行われる。ここで、パーソナルエリアの設定について図５、図６を用いて説明する。部屋ａの参加者は、表示部１０７ａの映像を確認しながら操作部１０６ａを操作して、部屋ｂにいる参加者の位置を指定する。この際、通信装置１００ａは、撮影して生成した映像データを加工せずに通信装置１００ｂに送信してもよいし、単位時間当たりのフレーム数を削減することにより、静止画に近い状態で送信してもよい。この位置の指定は、図６の破線で示したように、表示部１０７ａに表示されている参加者３０１、３０２、３０３を四角で囲むようにして行われる。ＣＰＵ１０１ａは、このそれぞれの四角の範囲をそれぞれのパーソナルエリアとして認識する。この際、ＣＰＵ１０１ａは、パーソナルエリアを四角の左上と右下の点の座標（ピクセル単位）によって認識する。これは、画面の位置をピクセル数で表した際に、画面の左上を（０，０）として、その点から右にｘピクセル、下にｙピクセル移動した点が（ｘ，ｙ）とした場合に、左上の座標（ｘＬ，ｙＬ）と右下の座標（ｘＲ，ｙＲ）として、１つのパーソナルエリアが認識される。以下、画面の右方向をｘ方向とし、その座標をｘ座標、下方向をｙ方向とし、その座標をｙ座標という。 First, the personal area is set before the start of the remote conference. Here, the setting of the personal area will be described with reference to FIGS. The participant in the room a operates the operation unit 106a while confirming the video on the display unit 107a, and designates the position of the participant in the room b. At this time, the communication device 100a may transmit the image data generated by shooting to the communication device 100b without processing, or transmit the image data close to a still image by reducing the number of frames per unit time. May be. The designation of the position is performed so as to surround the participants 301, 302, and 303 displayed on the display unit 107a with a square as shown by the broken line in FIG. The CPU 101a recognizes each square area as each personal area. At this time, the CPU 101a recognizes the personal area from the coordinates (pixel unit) of the upper left and lower right points of the square. This is when the position of the screen is represented by the number of pixels, where the upper left corner of the screen is (0, 0), and x pixels to the right and y pixels down from that point are (x, y). In addition, one personal area is recognized as upper left coordinates (xL, yL) and lower right coordinates (xR, yR). Hereinafter, the right direction of the screen is referred to as an x direction, the coordinates thereof are referred to as x coordinates, the downward direction is referred to as a y direction, and the coordinates are referred to as y coordinates.

そして、通信装置１００ａのＣＰＵ１０１ａは、通信装置１００ｂに対して認識した３つのパーソナルエリアに関する情報を送信して、通信装置１００ｂの映像入力部１０５ｂの撮影範囲におけるパーソナルエリアの位置をＣＰＵ１０１ｂに認識させる。部屋ｂについても、部屋ａと同様に通信装置１００ｂの操作が行われ、通信装置１００ａの映像入力部１０５ａの撮影範囲におけるパーソナルエリアの位置をＣＰＵ１０１ａに認識させる。以下、参加者２０１に対応するパーソナルエリアはＰＳＡ２０１、参加者２０２に対応するパーソナルエリアはＰＳＡ２０２として表記し、他の参加者に対応するパーソナルエリアも同様にして表記し、ＰＳＡ２０１の左上の座標は（ｘＬ２０１，ｙＬ２０１）とし、右下の座標は（ｘＲ２０１，ｙＬ２０１）という。また、各参加者で区別する必要が無い場合は、単にパーソナルエリアといい、左上の座標は（ｘＬ，ｙＬ）、右下の座標は（ｘＲ，ｙＲ）というものとする。 Then, the CPU 101a of the communication device 100a transmits information about the three personal areas recognized to the communication device 100b, and causes the CPU 101b to recognize the position of the personal area in the shooting range of the video input unit 105b of the communication device 100b. As for the room b, the operation of the communication device 100b is performed similarly to the room a, and the CPU 101a is made to recognize the position of the personal area in the shooting range of the video input unit 105a of the communication device 100a. Hereinafter, the personal area corresponding to the participant 201 is denoted as PSA 201, the personal area corresponding to the participant 202 is denoted as PSA 202, the personal area corresponding to the other participants is denoted in the same manner, and the upper left coordinate of the PSA 201 is ( xL201, yL201), and the lower right coordinates are (xR201, yL201). When there is no need to distinguish between the participants, it is simply referred to as a personal area, the upper left coordinates are (xL, yL), and the lower right coordinates are (xR, yR).

次に、各通信装置１００のＣＰＵ１０１は、認識したパーソナルエリアを複数組み合わせて、以下に説明するようなアクティブエリアを生成して認識するとともに、撮影範囲全体をオールエリアとして認識する。アクティブエリアは、複数のパーソナルエリアを含み、かつ最小の大きさになる四角の範囲によって生成される。すなわち、アクティブエリアの左上の座標は、複数のパーソナルエリアの左上の座標のうち最小のｘＬ（ｘＬｍｉｎ）と最小のｙＬ（ｙＬｍｉｎ）によって決まり、右下の座標は、同パーソナルエリアの右下の座標のうち最大のｘＲ（ｘＲｍａｘ）とｙＲ（ｙＲｍａｘ）によって決まることにより、アクティブエリアの左上の座標は（ｘＬｍｉｎ，ｙＬｍｉｎ）、右下の座標は（ｘＲｍａｘ，ｙＲｍａｘ）となる。 Next, the CPU 101 of each communication device 100 generates and recognizes an active area as described below by combining a plurality of recognized personal areas, and recognizes the entire photographing range as an all area. The active area is generated by a square area including a plurality of personal areas and having a minimum size. That is, the upper left coordinate of the active area is determined by the minimum xL (xLmin) and the minimum yL (yLmin) among the upper left coordinates of the plurality of personal areas, and the lower right coordinate is the lower right coordinate of the personal area. Are determined by the maximum xR (xRmax) and yR (yRmax), the upper left coordinates of the active area are (xLmin, yLmin), and the lower right coordinates are (xRmax, yRmax).

ここで、画面上の全てのパーソナルエリアを含むアクティブエリアをオールアクティブエリアといい、図５、図６の２点鎖線で表した部分に該当する。図５の場合は、左上の座標（ｘＬｍｉｎ，ｙＬｍｉｎ）は（ｘＬ２０１，ｙＬ２０３）、右下の座標（ｘＲｍａｘ，ｙＲｍａｘ）は（ｘＲ２０５，ｙＲ２０１）となる。なお、アクティブエリアは、近接する２つのパーソナルエリアを始めとして、様々なパーソナルエリアの組み合わせによって生成することができるが、本実施形態においては、アクティブエリアはオールアクティブエリアのみが、ＣＰＵ１０１によって生成されて認識されているものとし、以下では、オールアクティブエリアについてもアクティブエリアという。 Here, an active area including all personal areas on the screen is referred to as an all active area, and corresponds to a portion represented by a two-dot chain line in FIGS. In the case of FIG. 5, the upper left coordinates (xLmin, yLmin) are (xL201, yL203), and the lower right coordinates (xRmax, yRmax) are (xR205, yR201). The active area can be generated by a combination of various personal areas including two adjacent personal areas. However, in the present embodiment, only the all active area is generated by the CPU 101 in the present embodiment. In the following, all active areas are also referred to as active areas.

オールエリアについては、撮影範囲全体であるため、左上の座標は（０，０）、右下の座標は（ｘｍａｘ，ｙｍａｘ）となる。ここで、ｘｍａｘは撮影範囲全体のｘ方向のピクセル数から１引いた数字、ｙｍａｘは撮影範囲全体のｙ方向のピクセル数から１引いた数字となり、撮影範囲のピクセル数が６４０×４８０である場合には、ｘｍａｘ＝６３９、ｙｍａｘ＝４７９となる。これにより、各通信装置１００の各ＣＰＵ１０１は、各通信装置１００が設置された部屋についてのオールエリア、アクティブエリア、および各パーソナルエリアを座標として認識することになる。なお、パーソナルエリアが複数存在しない場合には、アクティブエリアはパーソナルエリアと同一として扱えばよい。 Since the entire area is the entire photographing range, the upper left coordinates are (0, 0) and the lower right coordinates are (xmax, ymax). Here, xmax is a number obtained by subtracting 1 from the number of pixels in the x direction of the entire shooting range, ymax is a number obtained by subtracting 1 from the number of pixels in the y direction of the entire shooting range, and the number of pixels in the shooting range is 640 × 480. In this case, xmax = 639 and ymax = 479. As a result, each CPU 101 of each communication device 100 recognizes the all area, the active area, and each personal area of the room in which each communication device 100 is installed as coordinates. If there are not a plurality of personal areas, the active area may be treated as the same as the personal area.

パーソナルエリアの設定が終了すると、映像データと音声データの通信が開始される。音声データについては、音声データ送信メッセージとして、通信ＩＦ１０９によって、ＩＰパケット化されて送信される。映像データに関しては、映像データ送信メッセージとして、通信ＩＦ１０９によって、ＩＰパケット化されて送信されるが、ネットワークの利用可能帯域幅によって送信方法が異なる。以下、映像データの送信について、通信装置１００ａが通信装置１００ｂに送信を行う場合を例として図７のフローチャートを用いて説明する。なお、通信装置１００ｂから通信装置１００ａに送信を行うときについては、以下の説明と同様に行われるため説明を省略する。 When the setting of the personal area is completed, communication between video data and audio data is started. The voice data is transmitted as a voice data transmission message after being converted into an IP packet by the communication IF 109. The video data is transmitted as a video data transmission message in the form of IP packets by the communication IF 109, but the transmission method differs depending on the available bandwidth of the network. Hereinafter, transmission of video data will be described using the flowchart of FIG. 7 as an example of a case where the communication device 100a transmits to the communication device 100b. Note that the transmission from the communication device 100b to the communication device 100a is performed in the same manner as described below, and thus the description thereof is omitted.

映像データの送信が開始されると、まず、通信装置１００ａのＣＰＵ１０１ａは、通信ＩＦ１０９ａを介して、通信装置１００ｂに対してオールエリアの映像データを送信する（Ｓ１００）。この映像データの送信は、静止画としての認識ができる程度でよいので、１フレーム分だけでもよい。次に、通信装置１００ａは、利用可能帯域幅測定を行う（Ｓ１１０）。 When transmission of video data is started, first, the CPU 101a of the communication device 100a transmits all-area video data to the communication device 100b via the communication IF 109a (S100). Since the video data need only be transmitted as a still image, only one frame may be transmitted. Next, the communication device 100a performs an available bandwidth measurement (S110).

ここで、利用可能帯域幅測定について説明する。利用可能帯域幅測定とは、通信装置１００ａが通信網１０を介して通信装置１００ｂと通信を行う際に、その通信網１０にて利用することのできる最大の通信帯域幅の測定を行うことをいう。まず、ＣＰＵ１０１ａは、パケットを送信する際の送信間隔を決定する。次に、ＣＰＵ１０１ａは、映像データと同等なテストデータを生成し、通信ＩＦ１０９ａは、テストデータがＲＴＰパケットのペイロード部に書き込まれた複数のテストデータ送信メッセージが、それぞれカプセル化されることにより複数のＩＰパケットとして生成され、決定した送信間隔において各ＩＰパケットを通信装置１００ｂに送信する。このとき、ＣＰＵ１０１ａは、送信した各ＲＴＰパケットのシーケンス番号をＲＡＭ１０３ａに記憶させる。 Here, the available bandwidth measurement will be described. The available bandwidth measurement is to measure the maximum communication bandwidth that can be used in the communication network 10 when the communication device 100a communicates with the communication device 100b via the communication network 10. Say. First, the CPU 101a determines a transmission interval when transmitting a packet. Next, the CPU 101a generates test data equivalent to the video data. The communication IF 109a encapsulates a plurality of test data transmission messages in which the test data is written in the payload part of the RTP packet, thereby encapsulating a plurality of test data transmission messages. Each IP packet is generated as an IP packet and transmitted to the communication device 100b at the determined transmission interval. At this time, the CPU 101a stores the sequence number of each transmitted RTP packet in the RAM 103a.

通信装置１００ｂのＣＰＵ１０１ｂは、テストデータ送信メッセージの受信を認識したら、通信ＩＦ１０９ｂにおいて受信した各パケットのシーケンス番号が書き込まれた受信通知メッセージが、ＩＰパケットとして生成され、通信装置１００ａに送信される。通信装置１００ａのＣＰＵ１０１ａは、受信した受信通知メッセージと送信したテストデータ送信メッセージから、テストデータの送信におけるパケットロスの発生率（受信されなかったパケット数／送信したパケット数）を算出する。 When the CPU 101b of the communication device 100b recognizes the reception of the test data transmission message, a reception notification message in which the sequence number of each packet received by the communication IF 109b is written is generated as an IP packet and transmitted to the communication device 100a. The CPU 101a of the communication device 100a calculates a packet loss occurrence rate (the number of packets not received / the number of transmitted packets) in the transmission of test data from the received reception notification message and the transmitted test data transmission message.

そしてパケットロスが所定の発生率未満であった場合は、送信間隔を短くして同様の処理を行うことにより、パケットロスの発生率を再度算出する。逆に所定の発生率以上であった場合には、送信間隔を長くして同様の処理を行うことにより、パケットロスの発生率を再度算出する。ここで、送信間隔が短くなると単位時間当たりの送信データ量すなわち送信レートは大きくなる一方、送信間隔が長くなると送信レートは小さくなる。そして、利用可能な帯域幅に比較して送信レートが小さい場合にはパケットロスの発生率は少なく、送信レートが利用可能な帯域幅に対して大きくなるに従って、パケットロスの発生率が高くなる。そのため、パケットロスが多く発生する場合には、通信帯域幅を大きく超えていることを意味する。これらの処理を行って、パケットロスが所定の発生率未満になる最短の送信間隔を決定することにより、利用可能な帯域幅の送信レートを算出することができる。 If the packet loss is less than the predetermined occurrence rate, the packet loss occurrence rate is recalculated by shortening the transmission interval and performing the same processing. On the other hand, if it is equal to or higher than the predetermined occurrence rate, the packet loss occurrence rate is calculated again by increasing the transmission interval and performing the same processing. Here, when the transmission interval is shortened, the transmission data amount per unit time, that is, the transmission rate is increased. On the other hand, when the transmission interval is increased, the transmission rate is decreased. When the transmission rate is smaller than the available bandwidth, the rate of occurrence of packet loss is small, and the rate of occurrence of packet loss increases as the transmission rate increases with respect to the available bandwidth. Therefore, if a lot of packet loss occurs, it means that the communication bandwidth is greatly exceeded. By performing these processes and determining the shortest transmission interval at which the packet loss is less than the predetermined occurrence rate, the transmission rate of the available bandwidth can be calculated.

以下、図７のフローチャートに戻って説明を続ける。ＣＰＵ１０１ａは、利用可能帯域幅測定を行って、通信網１０において利用可能な送信レートの最大値を算出する（Ｓ１１０）。 Hereinafter, the description will be continued by returning to the flowchart of FIG. The CPU 101a measures the available bandwidth and calculates the maximum value of the transmission rate that can be used in the communication network 10 (S110).

次に、送信する映像データとして、どのエリアの映像を送信するかを決定する。まず、ＣＰＵ１０１ａは、オールエリアに対応する映像データが、算出した送信レートにおいて送信できるデータ量であるかを判断する（Ｓ１２０）。送信できるデータ量である場合には、ＲＴＰパケットのヘッダ部にあるエリア情報に、オールエリアに対応する座標を書き込み、また、オールエリアの映像データをペイロード部に書き込んだ映像データ送信メッセージとして、通信ＩＦ１０９ａを介して通信装置１００ｂへ送信する（Ｓ１２１）。そして、この映像データの送信は、一定時間続けられる。 Next, it is determined which area of video to transmit as video data to be transmitted. First, the CPU 101a determines whether the video data corresponding to the all area has a data amount that can be transmitted at the calculated transmission rate (S120). When the amount of data can be transmitted, the coordinates corresponding to the all area are written in the area information in the header part of the RTP packet, and the video data transmission message in which the video data of the all area is written in the payload part is communicated. The data is transmitted to the communication device 100b via the IF 109a (S121). The transmission of the video data is continued for a certain time.

一方、送信できないデータ量である場合には、ＣＰＵ１０１ａは、アクティブエリアに対応する映像データのみを切り出した場合に、算出した送信レートにおいて送信できるデータ量であるかを判断する（Ｓ１３０）。送信できる場合には上記同様、アクティブエリアに対応する座標をＲＴＰパケットのヘッダ部にあるエリア情報に書き込み、ＣＰＵ１０１ａが切り出したアクティブエリアに対応する映像データをペイロード部に書き込んだ映像データ送信メッセージとして、通信ＩＦ１０９ａを介して通信装置１００ｂへ送信する（Ｓ１３１）。そして、この映像データの送信は、一定時間続けられる。 On the other hand, when the data amount cannot be transmitted, the CPU 101a determines whether the data amount can be transmitted at the calculated transmission rate when only the video data corresponding to the active area is cut out (S130). If transmission is possible, as described above, the coordinates corresponding to the active area are written in the area information in the header part of the RTP packet, and the video data corresponding to the active area cut out by the CPU 101a is written as the video data transmission message in the payload part. The data is transmitted to the communication device 100b via the communication IF 109a (S131). The transmission of the video data is continued for a certain time.

さらに、アクティブエリアに対応する映像データであっても送信できないデータ量である場合には、ＣＰＵ１０１ａは、さらにデータ量の少ないパーソナルエリアに対応する映像データでの送信を試みるが、パーソナルエリアが複数存在するため、まずパーソナルエリアの選定を行う（Ｓ１４０）。これは、部屋ａの参加者のうち、話をしている参加者（以下、話者という）に対応するパーソナルエリアを選定することによって行われる。これは、上述したように、音声入力部１０４ａのマイクアレイが話者の声を収音して、その音声データをＣＰＵ１０１ａが解析して、話者の方向（水平方向の角度θ）を特定し、パーソナルエリアの座標と対応させて、話者に対応するパーソナルエリアを選定する。 Further, if the video data corresponding to the active area has a data amount that cannot be transmitted, the CPU 101a tries to transmit video data corresponding to the personal area with a smaller data amount, but there are a plurality of personal areas. In order to do this, first, a personal area is selected (S140). This is performed by selecting a personal area corresponding to a participant who is speaking (hereinafter referred to as a speaker) among the participants in the room a. As described above, the microphone array of the voice input unit 104a picks up the voice of the speaker, and the CPU 101a analyzes the voice data to specify the direction of the speaker (the angle θ in the horizontal direction). The personal area corresponding to the speaker is selected in correspondence with the coordinates of the personal area.

ここで、水平方向の角度θとパーソナルエリアの座標との対応は、以下のように行われる。水平方向の角度θは、水平方向のみを示しているため、角度θを水平方向の座標であるｘ座標に変換することができる。この変換について図８を用いて説明する。図８は、ｙ方向から見た撮影範囲を示したものである。αは映像入力部１０５のイメージセンサの受光面であり、焦点Ｆは撮影範囲の端部と受光面αの端部をそれぞれ結んだときの交点を示している。受光面αの左端の点Ｏはｘ座標の原点であり、右端の点のｘ座標はｘｍａｘである。マイクアレイを用いて特定した話者の水平方向の角度θは、本実施形態に置いては、焦点Ｆから受光面αへの垂線ＦＭからの角度とほぼ同じになっている。ここで、焦点Ｆから話者の方向への直線は、受光面α上の点Ｐと交わり、その座標ｘｓはｘｍ＋ｆａ×ｔａｎθとして計算できる。ｘｍは受光面の中心Ｍの座標（ｘｍ＝ｘｍａｘ／２）であり、ｆａは中心Ｍと焦点Ｆの距離である。ＣＰＵ１０１ａは、このようにして得られた座標ｘｓと各パーソナルエリアの座標ｘＬ、ｘＲとを比較し、ｘｓがｘＬとｘＲの間になるパーソナルエリアを選定する。例えば、ｘｓがｘＬ２０４以上ｘＲ２０４以下であれば、ＰＳＡ２０４が選定されることになる。 Here, the correspondence between the horizontal angle θ and the coordinates of the personal area is performed as follows. Since the angle θ in the horizontal direction indicates only the horizontal direction, the angle θ can be converted into an x coordinate that is a coordinate in the horizontal direction. This conversion will be described with reference to FIG. FIG. 8 shows the photographing range viewed from the y direction. α is a light receiving surface of the image sensor of the video input unit 105, and a focal point F indicates an intersection when the end of the imaging range and the end of the light receiving surface α are respectively connected. The leftmost point O of the light receiving surface α is the origin of the x coordinate, and the x coordinate of the rightmost point is xmax. The horizontal angle θ of the speaker specified using the microphone array is substantially the same as the angle from the perpendicular FM from the focal point F to the light receiving surface α in the present embodiment. Here, the straight line from the focal point F to the direction of the speaker intersects with the point P on the light receiving surface α, and the coordinate xs can be calculated as xm + fa × tan θ. xm is a coordinate of the center M of the light receiving surface (xm = xmax / 2), and fa is a distance between the center M and the focal point F. The CPU 101a compares the coordinate xs thus obtained with the coordinates xL and xR of each personal area, and selects a personal area where xs is between xL and xR. For example, if xs is not less than xL204 and not more than xR204, the PSA 204 is selected.

図７に戻って説明を続ける。ＣＰＵ１０１ａは、上述のようにパーソナルエリアの選定した後、選定されたパーソナルエリアに対応する映像データのみを切り出した場合に、算出した送信レートにおいて送信できるデータ量であるかを判断する（Ｓ１５０）。送信できるデータ量である場合には上記同様、当該パーソナルエリアに対応する座標をＲＴＰパケットのヘッダ部にあるエリア情報に書き込み、ＣＰＵ１０１ａが切り出した当該パーソナルエリアに対応する映像データをペイロード部に書き込んだ映像データ送信メッセージとして、通信ＩＦ１０９ａを介して通信装置１００ｂへ送信する（Ｓ１５１）。そして、この映像データの送信は、一定時間続けられる。 Returning to FIG. 7, the description will be continued. After selecting the personal area as described above, the CPU 101a determines whether the data amount can be transmitted at the calculated transmission rate when only the video data corresponding to the selected personal area is cut out (S150). When the amount of data that can be transmitted is the same as above, the coordinates corresponding to the personal area are written in the area information in the header part of the RTP packet, and the video data corresponding to the personal area cut out by the CPU 101a is written in the payload part. The video data transmission message is transmitted to the communication device 100b via the communication IF 109a (S151). The transmission of the video data is continued for a certain time.

さらに、パーソナルエリアに対応する映像データであっても送信できないデータ量である場合には、ＣＰＵ１０１ａは、算出した送信レートにおいて送信できるデータ量までパーソナルエリアの映像データを圧縮してデータ量を低減し、当該パーソナルエリアに対応する座標をＲＴＰパケットのヘッダ部にあるエリア情報に書き込み、当該パーソナルエリアに対応する圧縮された映像データをペイロード部に書き込んだ映像データ送信メッセージとして、通信ＩＦ１０９ａを介して、通信装置１００ｂへ送信する（Ｓ１６０）。そして、この映像データの送信は、一定時間続けられる。ここで、映像データの圧縮は、フレーム数の削減、ピクセル数の間引き、色数、階調数の低減などを行うことによって行われる。このような映像データを送信して、後述するようにして通信装置１００ｂの表示部１０７ｂに表示させると、見た目にも大きく変化することになり、利用可能な帯域幅が狭くなっていることが、視覚的に参加者が判断することもできる。 Further, if the amount of video data corresponding to the personal area cannot be transmitted, the CPU 101a reduces the amount of data by compressing the video data in the personal area to the amount of data that can be transmitted at the calculated transmission rate. The coordinates corresponding to the personal area are written in the area information in the header part of the RTP packet, and the compressed video data corresponding to the personal area is written in the payload part as a video data transmission message via the communication IF 109a. It transmits to the communication device 100b (S160). The transmission of the video data is continued for a certain time. Here, compression of video data is performed by reducing the number of frames, thinning out the number of pixels, reducing the number of colors, and the number of gradations. When such video data is transmitted and displayed on the display unit 107b of the communication device 100b as will be described later, it also changes greatly in appearance, and the available bandwidth is reduced. Participants can also judge visually.

これらの処理によって、利用可能帯域幅測定によって得られた適切な送信レートで送信できるデータ量の映像データを送信することができるが、送信する映像データのデータ量が変化したときには、ＲＴＰパケット生成時に、ペイロード部に書き込む映像データの量を変化させて、パケットの送信間隔が同じになるようにしてもよいし、ペイロード部に書き込む映像データの量は同じにして、送信間隔を変化させてもよいし、ペイロード部に書き込む映像データの量を変化させ、さらに送信間隔を変化させてもよい。これらの選択は、ＲＴＰパケット生成時に、ＣＰＵ１０１ａが最適な方法を選択すればよい。 With these processes, it is possible to transmit video data having a data amount that can be transmitted at an appropriate transmission rate obtained by measuring the available bandwidth, but when the data amount of the video data to be transmitted changes, the RTP packet is generated. The amount of video data written in the payload portion may be changed so that the packet transmission interval is the same, or the amount of video data written in the payload portion may be the same and the transmission interval may be changed. Then, the amount of video data written in the payload portion may be changed, and the transmission interval may be changed. For these selections, the CPU 101a may select an optimum method when generating the RTP packet.

そして、映像データが一定時間送信された後に、ＣＰＵ１０１ａは、通信終了かどうか判断（Ｓ１７０）し、通信終了でない場合は、再び利用可能帯域幅測定（Ｓ１１０）から処理を行う。通信終了である場合は、映像データの送信を停止する。この通信終了かどうかについては、いずれかの部屋の参加者が通信装置１００の操作部１０６を操作して、通信を終了させる指示をだしたかどうかによって判断する。 Then, after the video data is transmitted for a certain time, the CPU 101a determines whether or not the communication is finished (S170). If the communication is not finished, the CPU 101a performs the processing from the available bandwidth measurement (S110) again. If the communication is terminated, transmission of video data is stopped. Whether or not the communication is terminated is determined by whether or not a participant in any room has operated the operation unit 106 of the communication apparatus 100 to give an instruction to terminate the communication.

以上、映像データの送信方法について説明したが、通信装置１００ａから送信された映像データを通信装置１００ｂが受信して、表示部１０７ｂに表示するまでの説明を行う。 Although the video data transmission method has been described above, a description is given from the time when the communication device 100b receives the video data transmitted from the communication device 100a until it is displayed on the display unit 107b.

まず、通信が開始されると、通信装置１００ｂは、オールエリアの映像データを受信し、ＣＰＵ１０１ｂは、その映像を表示部１０７ｂに表示させる。その後、通信装置１００ｂは、順次映像データを受信する。ここで、ＣＰＵ１０１ｂは、ＲＴＰパケットのヘッダ部にあるエリア情報に書き込まれている左上、右下の座標を読み取ることにより、画面上のどのエリアに該当する映像データであるかを認識する。 First, when communication is started, the communication device 100b receives all-area video data, and the CPU 101b displays the video on the display unit 107b. Thereafter, the communication device 100b sequentially receives video data. Here, the CPU 101b recognizes which area on the screen corresponds to the video data by reading the upper left and lower right coordinates written in the area information in the header part of the RTP packet.

そして、ＣＰＵ１０１ｂは、直前の映像データ（最後のフレームの画像に相当）をＲＡＭ１０３ｂに記憶させ、送信されてきたデータの映像データを認識した画面上のエリアに位置するようにして、ＲＡＭ１０３ｂに記憶した直前の映像データに合成して新たな映像データを生成し、この生成した映像データを表示部１０７ｂに出力し、表示部１０７ｂに映像を表示させる。そして、通信装置１００ａからＩＰパケットを受信する度に順次この処理が行われる。 Then, the CPU 101b stores the previous video data (corresponding to the image of the last frame) in the RAM 103b, and stores it in the RAM 103b so as to be positioned in the area on the screen where the video data of the transmitted data is recognized. New video data is generated by combining with the previous video data, the generated video data is output to the display unit 107b, and the video is displayed on the display unit 107b. This process is sequentially performed every time an IP packet is received from the communication device 100a.

このようにすると、表示部１０７ｂには、その直前まで表示されていた映像を停止した後に、認識した座標に該当する部分のみに映像が上書きするようにして表示される。例えば、オールエリアの映像を表示しているときに、パーソナルエリアのＰＳＡ２０２に該当する映像データを受信した場合には、オールエリアの映像を停止（そのとき再生していたフレームの映像の静止画となる）し、停止したオールエリアの映像に上書きしてＰＳＡ２０２に該当する部分の映像を再生する。すなわち、ＰＳＡ２０２以外の部分については、映像が停止したまま、静止画のようになっている。このようにして、通信装置１００ｂの表示部１０７ｂには、送信されてきたエリア情報に該当する部分の映像が、直前の映像を順次上書きしながら表示される。 If it does in this way, after stopping the image | video currently displayed until just before that, it will be displayed so that an image | video may be overwritten only to the part applicable to the recognized coordinate. For example, when the video data corresponding to the PSA 202 in the personal area is received while displaying the video in the all area, the video in the all area is stopped (the still image of the frame being played back at that time) Then, the video of the portion corresponding to the PSA 202 is reproduced by overwriting the video of the stopped all area. That is, the portions other than the PSA 202 are like still images while the video is stopped. In this manner, the video of the portion corresponding to the transmitted area information is displayed on the display unit 107b of the communication device 100b while sequentially overwriting the previous video.

以上、通信装置１００ａから通信装置１００ｂへ、映像データを送信した場合について説明したが、これらの処理は、通信装置１００ｂから通信装置１００ａに対しても行われ、通信装置１００ａと通信装置１００ｂとの間において双方向で行われることになる。そして、音声入力部１０４において収音されて生成された音声データについても、映像データと並行して双方向に通信が行われる。 The case where video data is transmitted from the communication device 100a to the communication device 100b has been described above, but these processes are also performed from the communication device 100b to the communication device 100a. Will be done in both directions. In addition, audio data generated by sound collection by the audio input unit 104 is also bidirectionally communicated in parallel with the video data.

このようにして、通信装置１００は、通信網１０における利用できる帯域幅を測定し、算出した送信レートにあったデータ量の映像データであるエリアを選定して行う送信について、通信装置１００ａと通信装置１００ｂが双方向に通信を行い、順次受信した映像データを直前の映像に上書きするようにして表示することにより、並行して行われる音声データの通信においてパケットロスを少なくして音声の欠落を減らすことができ、遠隔会議を効率よく行うことができる。 In this way, the communication apparatus 100 measures the bandwidth that can be used in the communication network 10, and communicates with the communication apparatus 100a for transmission performed by selecting an area that is video data having a data amount that matches the calculated transmission rate. The apparatus 100b communicates bidirectionally and displays the received video data so as to overwrite the previous video, thereby reducing packet loss and reducing audio loss in parallel audio data communication. It is possible to reduce the number of remote conferences.

以上、本発明の実施形態について説明したが、本発明は以下のように、さまざまな態様で実施可能である。 As mentioned above, although embodiment of this invention was described, this invention can be implemented in various aspects as follows.

＜変形例１＞
実施形態においては、音声入力部１０４が、複数の話者からの放音を収音した場合には、ＣＰＵ１０１ａは、最大の音量の方向を話者の方向として計算したが、それぞれの話者の方向を計算してもよい。この場合は、ＣＰＵ１０１ａは、パーソナルエリアの選定を複数行うことになるが、当該複数のパーソナルエリアの部分をそれぞれ抜き出した映像データを送信すればよい。なお、複数のパーソナルエリアに係る映像データを送信する代わりに、上述したような方法で当該複数のパーソナルエリアからアクティブエリアを生成し、当該アクティブエリアの部分を抜き出した映像データを送信するようにしてもよい。このようにすると、複数の話者がいても対応可能であり、さらにＣＰＵ１０１ａによって算出された送信レートにあわせた映像データの送信を行うことができる。 <Modification 1>
In the embodiment, when the voice input unit 104 picks up sound emitted from a plurality of speakers, the CPU 101a calculates the direction of the maximum volume as the speaker direction. The direction may be calculated. In this case, the CPU 101a selects a plurality of personal areas. However, the CPU 101a may transmit video data obtained by extracting portions of the plurality of personal areas. Instead of transmitting video data related to a plurality of personal areas, an active area is generated from the plurality of personal areas by the method described above, and video data extracted from the active area is transmitted. Also good. In this way, even if there are a plurality of speakers, it is possible to respond, and it is possible to transmit video data in accordance with the transmission rate calculated by the CPU 101a.

＜変形例２＞
実施形態においては、参加者が設定したパーソナルエリアについては、ｘ座標の範囲が重複しないように設定されていたが、重複するように設定してもよい。例えば、図９に示すように、ＰＳＡ２０２とＰＳＡ２０３において重複する場合、すなわちｘＲ２０２よりｘＬ２０３が小さくなる場合には、重複部分の中心のｘ座標（ｘＲ２０２＋ｘＬ２０３）／２として、ＣＰＵ１０１ａは、ＰＳＡ２０２とＰＳＡ２０３の境界を認識すればよい。なお、中心部分で分けずに、この重複部分に話者がいると計算された場合は、図中の２点鎖線のように、ＣＰＵ１０１ａは、ＰＳＡ２０２とＰＳＡ２０３からアクティブエリアを生成し、当該アクティブエリアに係る映像データを送信するようにしてもよい。このようにすれば、参加者が多い場合に、パーソナルエリアの設定が困難であっても、実施形態と同様な効果を得ることができる。 <Modification 2>
In the embodiment, the personal area set by the participant is set so that the x-coordinate ranges do not overlap, but may be set so as to overlap. For example, as shown in FIG. 9, when PSA 202 and PSA 203 overlap, that is, when xL 203 becomes smaller than xR 202, CPU 101 a determines the boundary between PSA 202 and PSA 203 as the x coordinate (xR202 + xL203) / 2 of the center of the overlapping portion. Should be recognized. If it is calculated that there is a speaker in this overlapping portion without dividing the central portion, the CPU 101a generates an active area from the PSA 202 and PSA 203 as shown by a two-dot chain line in the figure, and the active area The video data may be transmitted. In this way, when there are many participants, the same effects as in the embodiment can be obtained even if it is difficult to set a personal area.

＜変形例３＞
実施形態においては、音声入力部１０４にはマイクアレイを用い、これを利用して話者の方向をＣＰＵ１０１が計算していたが、マイクアレイの代わりにそれぞれ参加者の前にマイクロフォンを用意し、これらのマイクロフォンの収音によって生成された音声データの音量からＣＰＵ１０１が話者を特定するようにしてもよい。この場合は、各マイクロフォンとパーソナルエリアの対応を参加者が操作部１０６を操作して、ＣＰＵ１０１に認識させればよい。このようにすると、より正確な話者の位置を特定することができる。 <Modification 3>
In the embodiment, the microphone 101 is used for the voice input unit 104, and the CPU 101 calculates the direction of the speaker using this. However, instead of the microphone array, a microphone is prepared in front of each participant, The CPU 101 may specify the speaker from the volume of the voice data generated by collecting the microphones. In this case, the participant only has to make the CPU 101 recognize the correspondence between each microphone and the personal area by operating the operation unit 106. In this way, a more accurate speaker position can be specified.

＜変形例４＞
実施形態においては、ＲＴＰパケットのヘッダ部にあるエリア情報には、送信する映像データの座標の情報を書き込まれていたが、座標ではなく、各エリアを番号によって置き換えた数値を書き込まれるようにしてもよい。この場合には、各通信装置１００において、パーソナルエリアの座標の情報を送受信する際に、各パーソナルエリアに対応させてエリア番号を付加して送受信するようにすればよい。このようにすると、ＲＴＰパケットのデータ量を削減することができ、通信網１０の負荷を低減することができる。 <Modification 4>
In the embodiment, the coordinate information of the video data to be transmitted is written in the area information in the header part of the RTP packet, but instead of the coordinates, a numerical value obtained by replacing each area with a number is written. Also good. In this case, in each communication device 100, when transmitting / receiving the coordinate information of the personal area, it is only necessary to transmit / receive by adding an area number corresponding to each personal area. In this way, the data amount of the RTP packet can be reduced, and the load on the communication network 10 can be reduced.

＜変形例５＞
実施形態においては、アクティブエリアは、全てのパーソナルエリアからなるオールアクティブエリアのみとしていたが、パーソナルエリアに偏りがある場合には、その偏りに応じた一群のパーソナルエリアからアクティブエリア生成してもよい。例えば、図５に示すような参加者の配置の場合には、左右それぞれを一群のパーソナルエリアとして、ＰＳＡ２０１、ＰＳＡ２０２、ＰＳＡ２０３からアクティブエリアＬを生成し、ＰＳＡ２０４、ＰＳＡ２０５からアクティブエリアＲを生成する。そして、映像データ送信時において、話者の方向に該当するパーソナルエリアがＰＳＡ２０２であると選定された場合には、ＣＰＵ１０１ａは、ＰＳＡ２０２に係る映像データを送信する前に、ＰＳＡ２０２を含むアクティブエリアであるアクティブエリアＬに係る映像データのデータ量でも送信できるかどうか、算出した送信レートから判断し、送信可能であればＰＳＡ２０２に係る映像データを送信する代わりに、アクティブエリアＬに係る映像データを送信するようにすればよい。このようにすると、オールアクティブエリアに係る映像データではデータ量が多く、パーソナルエリアに係る映像データではデータ量が少なすぎる場合に、中間のデータ量をもつエリアに係る映像データを送信することができ、受信側の表示品位を高く保つことができる。 <Modification 5>
In the embodiment, the active area is only the all active area including all personal areas. However, if there is a bias in the personal area, the active area may be generated from a group of personal areas corresponding to the bias. . For example, in the case of the arrangement of participants as shown in FIG. 5, the left and right sides are set as a group of personal areas, the active area L is generated from the PSA 201, PSA 202, and PSA 203, and the active area R is generated from the PSA 204 and PSA 205. When the video data is transmitted, if the personal area corresponding to the direction of the speaker is selected as the PSA 202, the CPU 101a is an active area including the PSA 202 before transmitting the video data related to the PSA 202. Judgment is made from the calculated transmission rate whether or not the data amount of the video data related to the active area L can be transmitted. If transmission is possible, the video data related to the active area L is transmitted instead of transmitting the video data related to the PSA 202. What should I do? In this way, video data related to an area having an intermediate data amount can be transmitted when the video data related to the all active area has a large amount of data and the video data related to the personal area is too small. The display quality on the receiving side can be kept high.

＜変形例６＞
実施形態においては、部屋ａの参加者に係るパーソナルエリアの設定は、部屋ｂの参加者によって行われたが、部屋ａの参加者が自ら設定するようにしてもよい。この場合は、パーソナルエリア設定時において通信装置１００ａの映像入力部１０５ａが撮影した映像データを通信装置１００ａの表示部１０７ａに表示させることによって、部屋ａの参加者は自らの映像を確認しながら設定することができる。ここで、設定を行いやすくするために映像を左右反転して、表示部１０７ａに表示された映像が鏡を見ているような感覚になるようにすれば、参加者は設定を簡易に行うことができる。 <Modification 6>
In the embodiment, the setting of the personal area related to the participant in the room a is performed by the participant in the room b. However, the participant in the room a may set the personal area. In this case, when the personal area is set, the video data photographed by the video input unit 105a of the communication device 100a is displayed on the display unit 107a of the communication device 100a, so that the participant in the room a can make the setting while checking his / her video. can do. Here, if the video is reversed left and right so that the setting can be easily performed so that the video displayed on the display unit 107a looks like a mirror, the participant can easily perform the setting. Can do.

＜変形例７＞
実施形態においては、利用可能帯域幅測定を行い、最適なデータ量の映像データであるエリアを選択して送信するようにしていたが、オールエリアに係る映像データについては、最初に送信した後は、送信しないようにしてもよい。また、所定の時間ごとにオールエリアに係る映像データが送信されるようにしてもよいし、所定の時間ごとに映像の類似性をＣＰＵ１０１が判断して、ある所定の量以上の変動があった場合に送信するようにしてもよい。ここで、オールエリアに係る映像データは、１フレーム分（静止画の画像データでもよい）であればよいので、通信網１０の負荷はほとんど増えることはない。ここで、算出された送信レートが非常に低く、オールエリアに係る映像データを送信すると音声データが途切れてしまう場合には、ＣＰＵ１０１は、その後最初にオールエリアに係る映像データを送信できる送信レートになるまで待ってから、当該映像データを送信させる。 <Modification 7>
In the embodiment, the available bandwidth is measured, and the area that is the video data having the optimum data amount is selected and transmitted. However, the video data related to the all area is transmitted after the first transmission. , It may not be transmitted. Further, video data related to the all area may be transmitted every predetermined time, or the CPU 101 determines the similarity of the video every predetermined time, and there is a fluctuation more than a predetermined amount. You may make it transmit in a case. Here, since the video data relating to the all area may be one frame (may be still image data), the load on the communication network 10 hardly increases. Here, when the calculated transmission rate is very low and the audio data is interrupted when the video data related to the all area is transmitted, the CPU 101 then sets the transmission rate at which the video data related to the all area can be transmitted first. The video data is transmitted after waiting until

＜変形例８＞
実施形態においては、映像入力部１０５のイメージセンサと音声入力部１０４のマイクアレイは通信装置１００に固定されていたが、別々に移動可能にしてもよい。この場合は、イメージセンサの撮影範囲とマイクアレイの方向の対応について、参加者が操作部１０６を操作して、ＣＰＵ１０１に認識させればよい。このようにすると、通信装置１００の設置を様々な態様で行うことができる。 <Modification 8>
In the embodiment, the image sensor of the video input unit 105 and the microphone array of the audio input unit 104 are fixed to the communication device 100, but may be separately movable. In this case, the participant may operate the operation unit 106 to make the CPU 101 recognize the correspondence between the imaging range of the image sensor and the direction of the microphone array. If it does in this way, installation of communication apparatus 100 can be performed in various modes.

＜変形例９＞
実施形態においては、算出された送信レートによっては、パーソナルエリアに係る映像データを圧縮してデータ量を低減してから送信していたが、算出された送信レートが、パーソナルエリアに係る映像データを送るには充分高く、アクティブエリアに係る映像データを送るには少し低い場合には、アクティブエリアに係る映像データを圧縮して送信するようにしてもよい。このようにすると、映像を参加者が気にならない程度に劣化させることで、アクティブエリアの範囲の映像データを送信することができる。 <Modification 9>
In the embodiment, depending on the calculated transmission rate, the video data related to the personal area is transmitted after being compressed to reduce the amount of data. However, the calculated transmission rate is not the video data related to the personal area. If the video data is high enough to be sent and is a little low to send video data related to the active area, the video data related to the active area may be compressed and transmitted. In this way, video data in the range of the active area can be transmitted by degrading the video to such an extent that the participant does not care.

＜変形例１０＞
実施形態においては、撮影範囲全体をオールエリアとして認識していたが、オールエリアは、他の通信装置に切り出していない映像データを送信する際のエリアを示しているため、必ずしも撮影範囲全体としなくてもよい。 <Modification 10>
In the embodiment, the entire shooting range is recognized as an all area. However, the all area indicates an area for transmitting video data that has not been cut out to another communication device, and thus is not necessarily the entire shooting range. May be.

＜変形例１１＞
実施形態においては、利用可能帯域幅測定は一定時間ごとに行われていたが、ランダムに決定された時間ごとに行われてもよいし、算出された送信レートの状況に応じて変更される時間ごとに行なわれてもよい。例えば、送信レートの変動が少ない状態が続いた場合には、利用可能帯域幅測定を行なう間隔を長くするようにＣＰＵ１０１が制御してもよい。このようにすると、安定した送信レートが続いたときに、利用可能帯域幅測定の回数を削減することができる。 <Modification 11>
In the embodiment, the available bandwidth measurement is performed at regular time intervals, but may be performed at random time intervals, or may be changed according to the status of the calculated transmission rate. May be performed every time. For example, when the state in which the transmission rate varies little continues, the CPU 101 may control to increase the interval for measuring the available bandwidth. In this way, the number of available bandwidth measurements can be reduced when a stable transmission rate continues.

＜変形例１２＞
実施形態においては、選択されたエリアに対応する映像データを切り出して他の通信装置１００に送信していたが、選択されたエリアだけでなく、全てのエリアに対応する映像データを事前に生成しておき、選択されたエリアに対応する映像データを他の通信装置１００に送信するようにしてもよい。このようにすると、処理能力が低いＣＰＵ１０１であっても、選択されたエリアが変更された際に、その時点で画像処理を行う必要なくなり、映像データを選択するだけで送信することができる。 <Modification 12>
In the embodiment, the video data corresponding to the selected area is cut out and transmitted to the other communication device 100. However, not only the selected area but also the video data corresponding to all the areas is generated in advance. In addition, video data corresponding to the selected area may be transmitted to another communication apparatus 100. In this way, even if the CPU 101 has a low processing capability, when the selected area is changed, it is not necessary to perform image processing at that time, and transmission can be performed simply by selecting video data.

通信システムの構成を示すブロック図である。It is a block diagram which shows the structure of a communication system. 実施形態に係る通信装置の構成を示すブロック図である。It is a block diagram which shows the structure of the communication apparatus which concerns on embodiment. マイクアレイと音源の距離と方向の計算に関する説明図である。It is explanatory drawing regarding calculation of the distance and direction of a microphone array and a sound source. イメージセンサの撮影範囲と音源の方向の関係を示す説明図である。It is explanatory drawing which shows the relationship between the imaging | photography range of an image sensor, and the direction of a sound source. 部屋ｂにおける画面表示を示す説明図である。It is explanatory drawing which shows the screen display in the room b. 部屋ａにおける画面表示を示す説明図である。It is explanatory drawing which shows the screen display in the room a. 映像データの送信処理を説明するフローチャートである。It is a flowchart explaining the transmission process of video data. 話者の方向をｘ座標で表す際の計算に関する説明図である。It is explanatory drawing regarding the calculation at the time of expressing a speaker's direction by x coordinate. 変形例２に係るパーソナルエリアとアクティブエリアに関する説明図である。It is explanatory drawing regarding the personal area and active area which concern on the modification 2.

Explanation of symbols

１…通信システム、１０…通信網、１００…通信装置、１０１…ＣＰＵ、１０２…ＲＯＭ、１０３…ＲＡＭ、１０４…音声入力部、１０４１、・・・１０４４…マイクロフォン、１０５…映像入力部、１０６…操作部、１０７…表示部、１０８…音声出力部、１０９…通信ＩＦ、１１０…バス、２０１、・・・２０５、３０１、３０２、３０３…参加者、２００、３００…机 DESCRIPTION OF SYMBOLS 1 ... Communication system, 10 ... Communication network, 100 ... Communication apparatus, 101 ... CPU, 102 ... ROM, 103 ... RAM, 104 ... Audio | voice input part, 1041, ... 1044 ... Microphone, 105 ... Image | video input part, 106 ... Operation unit 107 ... Display unit 108 ... Audio output unit 109 ... Communication IF 110 ... Bus 201 ... 205, 301, 302, 303 ... Participant 200, 300 ... Desk

Claims

A communication device connected to a communication network and communicating with another communication device,
Shooting means for generating a video of a first area within the shooting range as first video data;
Area setting means for setting a plurality of ranges of a part of the first area as second areas;
Area generating means for generating one or more third areas by combining the plurality of second areas;
Sound collection means for collecting sound from a sound source and generating sound data;
Sound source direction specifying means for specifying the direction of the sound source based on the sound data generated by the sound collecting means;
Selecting means for selecting a second area including a position corresponding to the direction of the identified sound source from a plurality of set second areas;
Usable bandwidth measuring means for sequentially measuring a transmission rate at which transmission succeeds with a predetermined probability or higher when communicating with the other communication device via the communication network, every predetermined time;
Based on the transmission rate, one of the first area, the second area selected by the selecting means, and the third area including the second area selected by the selecting means is selected. A selection means;
The video of the second area is cut out from the first video data to generate second video data, and the video of the third area is cut out from the first video data to generate third video data Video data cutting means for
A communication apparatus comprising: a transmission unit configured to transmit information indicating an area selected by the selection unit and video data corresponding to the area to the other communication device.

The sound collection means has a plurality of microphones,
The sound source direction specifying means, said plurality of microphones based on each audio data generated by picking up sound from a sound source, according to claim 1, characterized in that to identify the direction of the sound source Communication device.

The communication apparatus according to claim 1 , wherein the video cutout unit generates third video data compressed so as to reduce a data amount based on the transmission rate.

The image cut-out unit, on the basis of the transmission rate, the communication device according to any one of claims 1 to 3, characterized in that to generate the second video data compressed to reduce the amount of data .

Receiving means for receiving information indicating an area and video data corresponding to the area from the other communication device;
When the receiving means receives the information indicating the area and the video data, the received video data is compared with the range corresponding to the received area of the video obtained by reproducing the video data immediately before the information. as image obtained by reproducing is displayed by overwriting, to any of the claims 1 to 4, further comprising a video data generating means for generating a fourth video data The communication device described.