JP2009159504A

JP2009159504A - Video conference system, video conference method, and program

Info

Publication number: JP2009159504A
Application number: JP2007337742A
Authority: JP
Inventors: Masayuki Imanishi; 将之今西
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2007-12-27
Filing date: 2007-12-27
Publication date: 2009-07-16

Abstract

<P>PROBLEM TO BE SOLVED: To select an optimal codec and a multi-point control unit and to perform transcoding. <P>SOLUTION: In a video conference system, an overlay network which is allocated onto a hash space by distribution hash table technology is used. A first video/voice processor 10 is provided with a first storage part for storing information on processable codec, a video processing part 5 and a voice processing part 6 for performing codec to data with a first codec. A first MCU (Multi-point Control Unit) 50 is provided with: a second storage part for storing processable codec information by adjacent nodes; and a video transcode part and voice transcode part 52 for performing transcoding of data which are stored in the second storage part and processed by the first codec on the basis of the information on the processable codec with the adjacent nodes to a second codec. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、例えば、処理可能なコーデックが異なる複数の装置間でデータを送受信する場合に、適切なコーデックにトランスコーディングするビデオ会議システム、ビデオ会議方法及びプログラムに関する。 The present invention relates to a video conference system, a video conference method, and a program for transcoding to an appropriate codec, for example, when data is transmitted and received between a plurality of devices having different processable codecs.

従来、例えば、離れた場所で同時に開催される会議を円滑に進めるため、互いの会議室に設置されたビデオ会議システムを用いて、話者が相互に発言したり、話者の様子を映し出したりすることが可能なビデオ会議システムが用いられている。このビデオ会議システムは、互いの会議室の様子を映したり、話者の発言内容を放音したりすることが可能な複数の映像／音声処理装置を備える。 Conventionally, for example, in order to facilitate a conference that is held at a remote location at the same time, speakers use the video conferencing system installed in each other's conference rooms to talk to each other or to show the state of the speaker. Video conferencing systems that can do this are used. This video conference system includes a plurality of video / audio processing devices capable of reflecting the state of each other's conference room and emitting the content of a speaker's speech.

映像／音声処理装置は、会議中の音声を収音するマイクロホンと、話者を撮影するカメラと、マイクロホンで収音した話者の音声に所定の処理を施す信号処理部と、他の会議室で発話する話者の様子を映し出す表示部と、話者の発話内容を放音するスピーカ等を備える。それぞれの会議室に設置された映像／音声処理装置は、通信回線を介して接続される。そして、記録した映像／音声データを互いに送受信することによって、それぞれの会議室の様子を表示し、発話内容を放音する。 The video / audio processing apparatus includes a microphone that collects audio during a conference, a camera that captures a speaker, a signal processing unit that performs predetermined processing on the audio of the speaker collected by the microphone, and other conference rooms A display unit that reflects the state of the speaker who utters the voice, a speaker that emits the content of the speaker's speech, and the like. The video / audio processing devices installed in each conference room are connected via a communication line. Then, by transmitting and receiving recorded video / audio data to each other, the state of each conference room is displayed and the utterance content is emitted.

そして、複数の拠点に設置された複数のビデオ会議システムで処理できるコーデックにデータをトランスコーディングする多地点接続装置（以下、単にＭＣＵ（Multi-point Control Unit）とも称する。）が用いられている。ＭＣＵを用いることによって、多地点に設置された映像／音声処理装置から構成されるビデオ会議システムによって、ビデオ会議を実現できる。そして、複数の映像／音声処理装置で扱えるコーデックが異なる場合、ＭＣＵが送受信するデータをトランスコーディングすることによって、送受信したデータを互いの装置間で利用できる。 A multipoint connection apparatus (hereinafter also simply referred to as MCU (Multi-point Control Unit)) that transcodes data to a codec that can be processed by a plurality of video conference systems installed at a plurality of bases is used. By using the MCU, a video conference can be realized by a video conference system including video / audio processing devices installed at multiple points. When the codecs that can be handled by a plurality of video / audio processing apparatuses are different, transceiving data transmitted / received by the MCU can be used between the apparatuses.

特許文献１には、ＭＣＵを用いて親端末の映像を子端末に通知する技術について開示されている。
特開２０００−２３１２９号公報 Patent Document 1 discloses a technique for notifying a child terminal of an image of a parent terminal using an MCU.
JP 2000-23129 A

しかし、ＭＣＵでトランスコーディングを行う場合における処理の負荷は重いため、リアルタイムでデータ送受信する場合に支障を来す場合がある。また、伝送されたデータのコーデックをトランスコーディングするため、画質や音質の劣化が発生してしまう。全ての映像／音声処理装置のコーデック情報を予め知ることができればトランスコーディング処理が少なくなるようにコーデックを選択することも可能である。しかし、全ての映像／音声処理装置で使用可能な全てのコーデック情報を管理するために管理サーバ等を準備すると、コストが発生してしまうため得策ではない。また、トランスコーディングの処理は、負荷がかかるため、できるだけトランスコーディングを行わなくてすむことが望ましい。 However, since the processing load when transcoding is performed by the MCU is heavy, there may be a problem in transmitting and receiving data in real time. In addition, since the codec of the transmitted data is transcoded, image quality and sound quality are deteriorated. If the codec information of all the video / audio processing apparatuses can be known in advance, the codec can be selected so that the transcoding process is reduced. However, if a management server or the like is prepared in order to manage all codec information that can be used in all video / audio processing apparatuses, it is not a good idea because costs are generated. Also, since the transcoding process is burdensome, it is desirable to avoid transcoding as much as possible.

本発明はこのような状況に鑑みて成されたものであり、最適なコーデックを選択して、トランスコーディングを行うことを目的とする。 The present invention has been made in view of such a situation, and an object thereof is to select an optimal codec and perform transcoding.

本発明は、少なくとも１つのノードが、分散ハッシュテーブル技術によりハッシュ空間上に割り当てられて構成されるオーバレイネットワークが用いられ、ノードには、データをコーデックする複数のデータ処理装置と、複数のデータ処理装置でコーデックされたデータを他のコーデックにトランスコーディングする多地点接続装置が含まれる場合に適用される。そして、データ処理装置は、処理可能なコーデックの情報を記憶し、記憶されたコーデックの情報に基づいて、第１のコーデックでデータをコーデックする。また、多地点接続装置は、隣り合うノードで処理可能なコーデックの情報を記憶し、隣り合うノードで処理可能なコーデックの情報に基づいて、第１のコーデックで処理されたデータを、第２のコーデックにトランスコーディングする。 The present invention uses an overlay network in which at least one node is allocated on a hash space by a distributed hash table technique, and each node includes a plurality of data processing devices that code data and a plurality of data processing. This is applied when a multipoint connection device that transcodes data coded by the device to another codec is included. Then, the data processing apparatus stores codec information that can be processed, and codes the data using the first codec based on the stored codec information. In addition, the multipoint connection device stores codec information that can be processed by the adjacent nodes, and based on the codec information that can be processed by the adjacent nodes, the data processed by the first codec Transcode to codec.

このようにしたことで、各ノードに配置される映像／音声処理装置毎に利用可能なコーデックを選択し、多地点接続装置でデータをトランスコーディングすることが可能となる。 By doing in this way, it becomes possible to select a codec that can be used for each video / audio processing device arranged in each node, and to transcode data in the multipoint connection device.

本発明によれば、オーバレイネットワークを利用して、映像／音声処理装置が利用可能なコーデックと多地点接続装置がトランスコーディング可能なコーデック情報とを管理しているため、適切な多地点接続装置を選択し、処理負荷が少ないトランスコーディングを行うことができるという効果がある。 According to the present invention, the codec that can be used by the video / audio processing apparatus and the codec information that can be transcoded by the multipoint connection apparatus are managed using the overlay network. There is an effect that it is possible to select and perform transcoding with a small processing load.

以下、本発明の一実施の形態例について、添付図面を参照して説明する。本実施の形態例では、映像データと音声データの処理を行う映像／音声処理システムとして、遠隔地間で映像データと音声データをリアルタイムで送受信可能なビデオ会議システム１００に適用した例として説明する。ビデオ会議システム１００は、例えば、多地点で同時に会議を行う会議システムとして用いられる。 Hereinafter, an embodiment of the present invention will be described with reference to the accompanying drawings. In this embodiment, a video / audio processing system for processing video data and audio data will be described as an example applied to a video conference system 100 capable of transmitting and receiving video data and audio data between remote locations in real time. The video conference system 100 is used, for example, as a conference system that performs conferences at multiple points simultaneously.

まず、ビデオ会議システム１００が実現されるネットワークの構成例について、図１を参照して説明する。映像／音声処理装置とＭＣＵはノードとしてオーバレイネットワークを構成する。そして、映像／音声処理装置とＭＣＵは、隣り合うノードのコーデック情報を保持する。 First, a configuration example of a network in which the video conference system 100 is realized will be described with reference to FIG. The video / audio processing device and the MCU constitute an overlay network as a node. The video / audio processing device and the MCU hold codec information of adjacent nodes.

図１（ａ）は、ビデオ会議システム１００のネットワークの構成例を示す図である。ビデオ会議システム１００は、互いに通信回線で接続される多地点のノードで構成される。
ビデオ会議システム１００は、第１の映像／音声処理装置１０〜第４の映像／音声処理装置４０と、第１のＭＣＵ５０と、第２のＭＣＵ６０から構成される。第１の映像／音声処理装置１０〜第４の映像／音声処理装置４０と、第１のＭＣＵ５０と、第２のＭＣＵ６０は、オーバレイネットワークを構成する。ＭＣＵは、映像／音声処理装置が処理可能なコーデックの情報と、他のＭＣＵがトランスコーディング可能なコーデックの情報を保持する。このため、例えば、多地点で同時にビデオ会議を行う際、ＭＣＵは、各映像／音声処理装置が使用可能な最適なコーデックを選択して、トランスコーディングすることができる。 FIG. 1A is a diagram illustrating a network configuration example of the video conference system 100. The video conference system 100 includes multi-point nodes connected to each other via communication lines.
The video conference system 100 includes a first video / audio processing device 10 to a fourth video / audio processing device 40, a first MCU 50, and a second MCU 60. The first video / audio processing device 10 to the fourth video / audio processing device 40, the first MCU 50, and the second MCU 60 constitute an overlay network. The MCU holds information on codecs that can be processed by the video / audio processing apparatus and information on codecs that can be transcoded by other MCUs. Therefore, for example, when a video conference is simultaneously performed at multiple points, the MCU can select and transcode an optimal codec that can be used by each video / audio processing apparatus.

図１（ｂ）は、各ＭＣＵがトランスコーディング可能なコーデックの例を示す図である。
第１のＭＣＵ５０が処理可能なコーデックは、Ｈ．２６４，ＭＰＥＧ（Moving Picture Experts Group）４，ＭＰＥＧ２である。
第２のＭＣＵ６０が処理可能なコーデックは、ＭＰＥＧ４，ＭＰＥＧ２，ＭＰＥＧ１である。 FIG. 1B is a diagram illustrating an example of a codec that can be transcoded by each MCU.
The codec that can be processed by the first MCU 50 is H.264. 264, MPEG (Moving Picture Experts Group) 4, MPEG2.
The codecs that can be processed by the second MCU 60 are MPEG4, MPEG2, and MPEG1.

図１（ｃ）は、各映像／音声処理装置が処理可能なコーデック（圧縮符号化方式）の例を示す図である。
第１の映像／音声処理装置１０が処理可能なコーデックは、Ｈ．２６４である。
第２の映像／音声処理装置２０が処理可能なコーデックは、Ｈ．２６４，ＭＰＥＧ４である。
第３の映像／音声処理装置３０が処理可能なコーデックは、ＭＰＥＧ４，ＭＰＥＧ２である。
第４の映像／音声処理装置４０が処理可能なコーデックは、ＭＰＥＧ２である。 FIG. 1C is a diagram illustrating an example of a codec (compression encoding method) that can be processed by each video / audio processing apparatus.
The codec that can be processed by the first video / audio processing apparatus 10 is H.264. H.264.
The codec that can be processed by the second video / audio processing device 20 is H.264. 264, MPEG4.
The codecs that can be processed by the third video / audio processing apparatus 30 are MPEG4 and MPEG2.
The codec that can be processed by the fourth video / audio processing device 40 is MPEG2.

本実施の形態に係るビデオ会議システム１００は、ネットワークに接続されたすべてのノード（ピアノード）が動的にその役割を変えられる、ピア・ツー・ピア（以降Ｐ２Ｐと称する）と呼ばれる接続形態を採用することを特徴としている。
従来のクライアント・サーバ型のシステムで、サーバが一括して保有していたデータは、Ｐ２Ｐを用いたシステムにおいては各ピアノードに分散して保存される。つまり、データの保存場所の検索はピアノード同士が協力して行う。 The video conference system 100 according to the present embodiment employs a connection form called peer-to-peer (hereinafter referred to as P2P) in which all nodes (peer nodes) connected to the network can dynamically change their roles. It is characterized by doing.
In a conventional client-server type system, data that the server collectively holds is distributed and stored in each peer node in a system using P2P. In other words, the search for the data storage location is performed in cooperation with peer nodes.

データの保存場所の検索技術としては、例えば分散ハッシュテーブル（以下、ＤＨＴ：Distributed Hash Tableと称する）が知られている。ＤＨＴでは、データから生成されたハッシュ値と近いハッシュ値を持つノードに、データの実体の保存場所情報が登録される。このとき、データのハッシュ値生成とノードのハッシュ値生成には、同じハッシュ関数が使われる。そして、データの実態の保存場所情報が登録された各ノードにおいて、データのハッシュ値とデータの実態の保存場所情報との組が、テーブル（ハッシュテーブル）として保持される。 As a data storage location retrieval technique, for example, a distributed hash table (hereinafter referred to as DHT: Distributed Hash Table) is known. In DHT, data storage location information is registered in a node having a hash value close to a hash value generated from data. At this time, the same hash function is used for generating the hash value of the data and the hash value of the node. In each node where the actual storage location information of data is registered, a set of the hash value of the data and the actual storage location information of the data is held as a table (hash table).

ハッシュ値は、元となるデータが異なれば全く違う値となるため、データの保存場所情報が登録される先も、ネットワーク上に分散される。つまり、ハッシュテーブルが各ピアノードに分散して配置されるため、各ピアノードにかかる負荷も分散される。 Since the hash value is completely different if the original data is different, the destination where the data storage location information is registered is also distributed on the network. That is, since the hash table is distributed and arranged in each peer node, the load applied to each peer node is also distributed.

データの検索を行う場合には、データのハッシュ値を算出してそのハッシュ値をキーに検索すればよい。ＤＨＴでは、ネットワークを構成するすべてのノードに、近傍のノードへのルートが記されたルーティングテーブルを予め備えさせており、そのルーティングテーブルにおいては、各ノード間の距離が、各ノードのハッシュ値で表現されている。 When searching for data, a hash value of data may be calculated and searched using the hash value as a key. In DHT, all nodes constituting a network are provided with a routing table in which routes to neighboring nodes are recorded in advance. In the routing table, the distance between the nodes is the hash value of each node. It is expressed.

ネットワーク上のいずれかのノードに登録されたデータを参照したい場合は、まずそのデータのハッシュ値を求め、自ノード内のルーティングテーブルの中で、データのハッシュ値と最も近いハッシュ値を持つノードに対して、検索要求を投げかける。検索要求を受け取ったノードがデータの保存場所情報を所持していない場合は、今度は検索要求を受け取ったノードが、自ノード内のルーティングテーブルの中の、データのハッシュ値と最も近いハッシュ値を持つノードに対して、検索要求を投げかける。このような動作が繰り返されることにより検索範囲が縮まっていき、最終的に、検索したいデータの保存場所情報を得ることが可能となる。検索したいデータの保存場所情報が分かれば、その情報を基に実際のデータを取得することができる。つまり、ＤＨＴの技術を利用して構築されたオーバレイネットワーク上では、データの実体がどこにあるかを意識することなく、データ実体にアクセスすることが可能となる。 If you want to refer to the data registered in any node on the network, first find the hash value of that data, and in the routing table in your node, find the node that has the closest hash value to the data hash value. On the other hand, a search request is thrown. If the node that received the search request does not have data storage location information, this time, the node that received the search request uses the hash value closest to the hash value of the data in the routing table in its own node. A search request is sent to the node that has it. By repeating such an operation, the search range is narrowed, and finally, it is possible to obtain storage location information of data to be searched. If you know the storage location information of the data you want to search, you can get the actual data based on that information. That is, on the overlay network constructed using the DHT technology, it is possible to access the data entity without being aware of where the data entity is.

このように、ＤＨＴでは、ハッシュ値で示された各ピアノード間の距離に基づいてルーティングが行われるため、ＩＰネットワーク上に設けられたセグメントを意識する必要がなくなる。つまり、ＩＰ層よりも上の層でルーティングを行うＤＨＴを用いて、オーバレイネットワークを構築することが可能となる。 In this way, in DHT, routing is performed based on the distance between each peer node indicated by the hash value, so it is not necessary to be aware of the segments provided on the IP network. That is, it is possible to construct an overlay network using DHT that performs routing in a layer above the IP layer.

次に、ビデオ会議システム１００の内部構成例について、図２を参照して説明する。
図２は、第１の映像／音声処理装置１０と第１のＭＣＵ５０が接続される場合の構成例を示す図である。図示しないが、第１の映像／音声処理装置１０と第１のＭＣＵ５０は、例えば、イントラネット等の通信回線によって他の複数台の映像／音声処理装置やＭＣＵが接続されうる。 Next, an example of the internal configuration of the video conference system 100 will be described with reference to FIG.
FIG. 2 is a diagram illustrating a configuration example when the first video / audio processing apparatus 10 and the first MCU 50 are connected. Although not shown, the first video / audio processing apparatus 10 and the first MCU 50 can be connected to a plurality of other video / audio processing apparatuses and MCUs via a communication line such as an intranet, for example.

第１の映像／音声処理装置１０は、話者を撮影して、アナログ映像データを生成する撮像部１と、撮像部１から供給されるアナログ映像データをディジタル映像データに変換するアナログ／ディジタル（Ａ／Ｄ：Analog/Digital）変換部１１と、を備える。 The first video / audio processing apparatus 10 shoots a speaker and generates analog video data, and analog / digital (converts analog video data supplied from the imaging unit 1 into digital video data). A / D (Analog / Digital) converter 11.

撮像部１は、レンズ部１ａを備えており、レンズ部１ａを介して入射した像光をＣＣＤ（Charge Coupled Device）撮像素子１ｂの撮像面に結像させる構成としている。撮像部１によって生成されたアナログ映像データは、アナログ／ディジタル変換部１１に供給される。 The imaging unit 1 includes a lens unit 1a, and is configured to form image light incident through the lens unit 1a on an imaging surface of a CCD (Charge Coupled Device) imaging element 1b. The analog video data generated by the imaging unit 1 is supplied to the analog / digital conversion unit 11.

また、第１の映像／音声処理装置１０は、ディジタル映像データを所定のコーデックで符号／復号化する映像処理部５と、映像処理部５から供給されるディジタル映像データをアナログ映像データに変換するディジタル／アナログ変換部１２と、ディジタル／アナログ変換部１２から供給されるアナログ映像データをアンプ（不図示）で増幅し、映像を表示する表示部３と、を備える。映像処理部５は、記憶部１７（後述の図３を参照）から読み出したコーデックの情報に基づいて、ディジタル映像データを所定のコーデックで符号／復号化する。 The first video / audio processing apparatus 10 also converts the video processing unit 5 that encodes / decodes digital video data with a predetermined codec, and converts the digital video data supplied from the video processing unit 5 into analog video data. A digital / analog conversion unit 12 and a display unit 3 for amplifying analog video data supplied from the digital / analog conversion unit 12 with an amplifier (not shown) and displaying the video are provided. The video processing unit 5 encodes / decodes the digital video data using a predetermined codec based on the codec information read from the storage unit 17 (see FIG. 3 described later).

また、第１の映像／音声処理装置１０は、話者が発話する音声を収音してアナログ音声データを生成するマイクロホン２と、マイクロホン２から供給されるアナログ音声データを、アンプ（不図示）で増幅し、ディジタル音声データに変換するアナログ／ディジタル変換部１３と、を備える。 Further, the first video / audio processing device 10 collects the voice uttered by the speaker and generates analog voice data, and the amplifier (not shown) receives the analog voice data supplied from the microphone 2. And an analog / digital converter 13 for amplifying and converting the digital audio data into digital audio data.

また、第１の映像／音声処理装置１０は、ディジタル音声データを所定のコーデックで符号／復号化する音声処理部６と、音声処理部６から供給されるディジタル音声データをアナログ音声データに変換するディジタル／アナログ変換部１４と、ディジタル／アナログ変換部１４から供給されるアナログ音声データをアンプ（不図示）で増幅し、放音するスピーカ４と、を備える。音声処理部６は、記憶部１７（後述の図３を参照）から読み出したコーデックの情報に基づいて、ディジタル音声データを所定のコーデックで符号／復号化する。 The first video / audio processing apparatus 10 also converts an audio processing unit 6 that encodes / decodes digital audio data with a predetermined codec, and converts the digital audio data supplied from the audio processing unit 6 into analog audio data. A digital / analog conversion unit 14 and a speaker 4 that amplifies analog sound data supplied from the digital / analog conversion unit 14 with an amplifier (not shown) and emits the sound. The audio processing unit 6 encodes / decodes digital audio data using a predetermined codec based on the codec information read from the storage unit 17 (see FIG. 3 described later).

また、第１の映像／音声処理装置１０は、隣り合う映像／音声処理装置、ＭＣＵに対して、ディジタル映像データとディジタル音声データを送受信するためのインタフェースとなる複数個のネットワークインタフェース７を備える。ネットワークインタフェース７は、ディジタル映像データとディジタル音声データをパケットに分割し、所定の伝送プロトコルで他の映像／音声処理装置、ＭＣＵに伝送する。また、他の映像／音声処理装置、ＭＣＵから受け取ったパケットを結合し、元のディジタル映像データとディジタル音声データを生成する。 The first video / audio processing apparatus 10 includes a plurality of network interfaces 7 serving as interfaces for transmitting / receiving digital video data and digital audio data to / from adjacent video / audio processing apparatuses and MCUs. The network interface 7 divides the digital video data and digital audio data into packets and transmits them to other video / audio processing devices and MCUs using a predetermined transmission protocol. Also, packets received from other video / audio processing devices and MCUs are combined to generate original digital video data and digital audio data.

第１のＭＣＵ５０は、第１の映像／音声処理装置１０から受け取るディジタル映像データを所定のコーデックにトランスコーディングする映像トランスコード部５１と、第１の映像／音声処理装置１０から受け取るディジタル音声データを所定のコーデックにトランスコーディングする音声トランスコード部５２と、を備える。 The first MCU 50 transcodes digital video data received from the first video / audio processing device 10 into a predetermined codec, and digital audio data received from the first video / audio processing device 10. And an audio transcoding unit 52 for transcoding to a predetermined codec.

また、第１のＭＣＵ５０は、隣り合う映像／音声処理装置、ＭＣＵに対して、ディジタル映像データとディジタル音声データを送受信するためのインタフェースとなる複数個のネットワークインタフェース５３を備える。ネットワークインタフェース５３は、ディジタル映像データとディジタル音声データをパケットに分割し、所定の伝送プロトコルで他の映像／音声処理装置、ＭＣＵにパケットを伝送する。また、他の映像／音声処理装置、ＭＣＵから受け取ったパケットを結合し、元のディジタル映像データとディジタル音声データを生成する。 The first MCU 50 also includes a plurality of network interfaces 53 serving as interfaces for transmitting / receiving digital video data and digital audio data to / from adjacent video / audio processing units and MCUs. The network interface 53 divides the digital video data and the digital audio data into packets, and transmits the packets to other video / audio processing devices and MCUs using a predetermined transmission protocol. Also, packets received from other video / audio processing devices and MCUs are combined to generate original digital video data and digital audio data.

映像処理部５で符号化されたディジタル映像データは、ネットワークインタフェース７を介して、隣り合う映像／音声処理装置、ＭＣＵに送られる。また、隣り合う映像／音声処理装置、ＭＣＵから受け取るディジタル映像データは、ネットワークインタフェース７を介して映像処理部５に送られる。 The digital video data encoded by the video processing unit 5 is sent to the adjacent video / audio processing unit and MCU via the network interface 7. In addition, digital video data received from adjacent video / audio processing devices and MCUs is sent to the video processing unit 5 via the network interface 7.

また、各映像／音声処理装置で収音され、生成されたディジタル音声データは、他の映像／音声処理装置から供給されるディジタル音声データと混合され、他の映像／音声処理装置、ＭＣＵに送出される。こうして、映像／音声処理装置とＭＣＵが互いに接続されるため、拠点間での映像と音声がリアルタイムで再生される。接続される回線は、全二重通信であるため、各拠点に散在する話者は、互いの様子を見ながら同時に通話することができる。 Digital audio data collected and generated by each video / audio processing device is mixed with digital audio data supplied from other video / audio processing devices, and sent to the other video / audio processing devices and MCUs. Is done. Thus, since the video / audio processing apparatus and the MCU are connected to each other, video and audio between the bases are reproduced in real time. Since the line to be connected is full-duplex communication, the speakers scattered at each base can talk simultaneously while watching each other.

次に、映像／音声処理装置の構成例について、図３を参照して説明する。
ここでは、第１の映像／音声処理装置１０の内部構成例について説明する。ただし、図２において既に説明した箇所と同一の箇所については、詳細な説明を省略する。また、第２の映像／音声処理装置２０〜第４の映像／音声処理装置４０の内部構成例については、第１の映像／音声処理装置１０の内部構成例と同様であるため、詳細な説明を省略する。 Next, a configuration example of the video / audio processing apparatus will be described with reference to FIG.
Here, an example of the internal configuration of the first video / audio processing apparatus 10 will be described. However, detailed description of the same portions as those already described in FIG. 2 is omitted. Further, the internal configuration examples of the second video / audio processing device 20 to the fourth video / audio processing device 40 are the same as the internal configuration example of the first video / audio processing device 10, and thus detailed description will be made. Is omitted.

第１の映像／音声処理装置１０は、各部を制御する制御部１６と、ディジタル映像データとディジタル音声データを記憶したり、映像処理部５と音声処理部６が処理可能なコーデックの情報を記憶したりする記憶部１７と、ユーザからの入力操作を受け付け、処理を実行させる入力部１８と、を備える。入力部１８は、マウス、キーボード、タッチパネル等の入力装置が含まれる。また、以下の説明において、コーデックの情報とは、例えば、ＭＰＥＧ４などのコーデックの名称を指し、コーデックするとは、コーデックの情報から定まるコーデックを用いて（例えば、ＭＰＥＧ４）、符号化、復号化処理を行うことを意味する。 The first video / audio processing apparatus 10 stores a control unit 16 that controls each unit, digital video data and digital audio data, and information on codecs that can be processed by the video processing unit 5 and the audio processing unit 6. And a storage unit 17 that receives data and an input unit 18 that receives an input operation from a user and executes a process. The input unit 18 includes input devices such as a mouse, a keyboard, and a touch panel. In the following description, codec information refers to the name of a codec such as MPEG4, for example, and a codec refers to encoding and decoding processing using a codec determined from the codec information (for example, MPEG4). Means to do.

第１の映像／音声処理装置１０は、他の映像／音声処理装置とＭＣＵから、処理可能なコーデック情報の送出が要求された場合、第１の映像／音声処理装置１０が処理可能なコーデック情報や、他の映像／音声処理装置が処理可能なコーデック情報と、ＭＣＵがトランスコーディング可能なコーデック情報を、他の映像／音声処理装置とＭＣＵに送出する。一方、第１の映像／音声処理装置１０は、他の映像／音声処理装置が処理可能なコーデック情報と、ＭＣＵがトランスコーディング可能なコーデック情報について、他の映像／音声処理装置とＭＣＵから取得する。取得した他の映像／音声処理装置が処理可能なコーデック情報と、ＭＣＵがトランスコーディング可能なコーデック情報は、記憶部１７に記憶される。また、記憶部１７には、第１の映像／音声処理装置１０を識別するためのユニークな識別情報（以下、ＩＤ（Identification）情報と称する。）が記憶される。 The first video / audio processing device 10 is capable of processing codec information that can be processed by the first video / audio processing device 10 when transmission of codec information that can be processed is requested from another video / audio processing device and the MCU. Alternatively, codec information that can be processed by other video / audio processing apparatuses and codec information that can be transcoded by the MCU are transmitted to the other video / audio processing apparatuses and the MCU. On the other hand, the first video / audio processing device 10 acquires codec information that can be processed by other video / audio processing devices and codec information that can be transcoded by the MCU from other video / audio processing devices and the MCU. . The acquired codec information that can be processed by another video / audio processing apparatus and codec information that can be transcoded by the MCU are stored in the storage unit 17. The storage unit 17 also stores unique identification information (hereinafter referred to as ID (Identification) information) for identifying the first video / audio processing device 10.

次に、ＭＣＵの構成例について、図４を参照して説明する。
ここでは、第１のＭＣＵ５０の内部構成例について説明する。ただし、図２において既に説明した箇所と同一の箇所については、詳細な説明を省略する。また、第２のＭＣＵ６０の内部構成例については、第１のＭＣＵ５０の内部構成例と同様であるため、詳細な説明を省略する。 Next, a configuration example of the MCU will be described with reference to FIG.
Here, an internal configuration example of the first MCU 50 will be described. However, detailed description of the same portions as those already described in FIG. 2 is omitted. Further, the internal configuration example of the second MCU 60 is the same as the internal configuration example of the first MCU 50, and thus detailed description thereof is omitted.

第１のＭＣＵ５０は、各部を制御する制御部５４と、通過するディジタル映像データとディジタル音声データを一時的に記憶したり、隣り合う映像／音声処理装置が処理可能なコーデックを記憶したり、第１のＭＣＵ５０がトランスコーディング可能なコーデックを記憶したりする記憶部５５と、を備える。また、記憶部５５には、第１のＭＣＵ５０を識別するためのユニークなＩＤ情報が記憶される。 The first MCU 50 temporarily stores digital video data and digital audio data that pass through a control unit 54 that controls each unit, a codec that can be processed by an adjacent video / audio processing device, And a storage unit 55 that stores codecs that can be transcoded by one MCU 50. The storage unit 55 stores unique ID information for identifying the first MCU 50.

第１のＭＣＵ５０は、隣り合う映像／音声処理装置、ＭＣＵで処理可能なコーデックを予め記憶部５５に記憶する。また、記憶部５５には、通過するパケットをディジタル映像データとディジタル音声データに戻して、トランスコーディングするため記憶する場合がある。このため、第１のＭＣＵ５０は、受け取ったディジタル映像データとディジタル音声データに対して、最適なコーデックにトランスコーディングできる。 The first MCU 50 stores in advance the video / audio processing apparatus adjacent to the first MCU and a codec that can be processed by the MCU in the storage unit 55. In addition, the storage unit 55 may return the passing packets to digital video data and digital audio data and store them for transcoding. Therefore, the first MCU 50 can transcode the received digital video data and digital audio data to an optimum codec.

第１のＭＣＵ５０は、隣り合う映像／音声処理装置やＭＣＵと所定の周期で情報を交換する。交換する情報には、ビデオ会議システム１００に新しく組み込まれた映像／音声処理装置やＭＣＵのＩＤ情報や、映像／音声処理装置で処理可能なコーデックの情報が含まれる。第１のＭＣＵ５０は、これらの情報を基にして、最適なコーデックを選択し、映像トランスコード部５１と音声トランスコード部５２を用いて、記憶部５５に一時記憶されたディジタル映像データとディジタル音声データのコーデックを他のコーデックにトランスコーディングする。このとき、映像トランスコード部５１と音声トランスコード部５２は、例えば、ＭＰＥＧ２でコーデックされたディジタル映像データとディジタル音声データを、ＭＰＥＧ４にトランスコーディングする。トランスコーディングが不要なデータである場合、第１のＭＣＵ５０は、パケットで伝送されるディジタル映像データとディジタル音声データに処理を加えることなく通過させる。 The first MCU 50 exchanges information with adjacent video / audio processing devices and MCUs at a predetermined cycle. The information to be exchanged includes video / audio processing apparatus and MCU ID information newly incorporated in the video conference system 100 and codec information that can be processed by the video / audio processing apparatus. The first MCU 50 selects an optimal codec based on these pieces of information, and uses the video transcoding unit 51 and the audio transcoding unit 52 to temporarily store the digital video data and digital audio data stored in the storage unit 55. Transcode data codec to another codec. At this time, the video transcoding unit 51 and the audio transcoding unit 52 transcode digital video data and digital audio data coded in MPEG2 into MPEG4, for example. When the transcoding is unnecessary, the first MCU 50 passes the digital video data and the digital audio data transmitted in the packet without processing.

次に、ビデオ会議システム１００を構成するネットワークに新たなノード（映像／音声処理装置やＭＣＵ)を追加する場合の構成例について、図５を参照して説明する。 Next, a configuration example in the case where a new node (video / audio processing device or MCU) is added to the network configuring the video conference system 100 will be described with reference to FIG.

図５では、ビデオ会議システム１００に第ｎの映像／音声処理装置１１０と、第ｍのＭＣＵ１２０を追加した場合の例を示す図である。 FIG. 5 is a diagram illustrating an example in which an nth video / audio processing device 110 and an mth MCU 120 are added to the video conference system 100.

第１の映像／音声処理装置１０に隣り合うノードは、第２の映像／音声処理装置２０と第ｍのＭＣＵ１２０である。ビデオ会議システム１００に、第ｍのＭＣＵ１２０が追加されると、第ｍのＭＣＵ１２０で処理可能なコーデックの情報が第１の映像／音声処理装置１０と第２のＭＣＵ６０に伝送される。第１の映像／音声処理装置１０と、第２の映像／音声処理装置２０が処理可能なコーデック情報と、第ｍのＭＣＵ１２０がトランスコーディング可能なコーデック情報は、第１の映像／音声処理装置１０が保持する第１のコーデック管理テーブルに記憶される。第１のコーデック管理テーブル１５は、第１の映像／音声処理装置１０が備える記憶部１７に所定の領域を確保して構成されるテーブルである。 Nodes adjacent to the first video / audio processing apparatus 10 are the second video / audio processing apparatus 20 and the m-th MCU 120. When the m-th MCU 120 is added to the video conference system 100, codec information that can be processed by the m-th MCU 120 is transmitted to the first video / audio processing apparatus 10 and the second MCU 60. The codec information that can be processed by the first video / audio processing device 10, the second video / audio processing device 20, and the codec information that can be transcoded by the m-th MCU 120 are the first video / audio processing device 10. Is stored in the first codec management table. The first codec management table 15 is a table configured by securing a predetermined area in the storage unit 17 included in the first video / audio processing apparatus 10.

同様に、第２の映像／音声処理装置２０に隣り合うノードは、第１の映像／音声処理装置１０と第３の映像／音声処理装置３０である。本例では、第１の映像／音声処理装置１０と、第２の映像／音声処理装置２０と、第３の映像／音声処理装置３０が処理可能なコーデック情報が、第２の映像／音声処理装置２０が保持する第２のコーデック管理テーブル２５に記憶される。第２のコーデック管理テーブル２５も、第２の映像／音声処理装置２０が備える記憶部１７に所定の領域を確保して構成されるテーブルである。 Similarly, the nodes adjacent to the second video / audio processing device 20 are the first video / audio processing device 10 and the third video / audio processing device 30. In this example, codec information that can be processed by the first video / audio processing device 10, the second video / audio processing device 20, and the third video / audio processing device 30 is the second video / audio processing. It is stored in the second codec management table 25 held by the device 20. The second codec management table 25 is also a table configured by securing a predetermined area in the storage unit 17 included in the second video / audio processing device 20.

オーバレイネットワークではサーバ側で大きなデータベースを持つことなく、大規模ネットワークに適用できるように構成されるため、全てのノードのコーデック情報を一元管理するわけではない。しかし、オーバレイネットワークに参加しているノードのコーデック情報は、ノード間の情報をたどっていけば、目的のコーデック情報に辿りつくことが可能である。そして、ユーザが利用する際には、仮想的にコーデック情報を一元管理したサーバがあるかのように利用できる。 Since the overlay network is configured so that it can be applied to a large-scale network without having a large database on the server side, codec information of all nodes is not centrally managed. However, the codec information of the nodes participating in the overlay network can reach the target codec information by following the information between the nodes. When the user uses it, it can be used as if there is a server that virtually manages codec information.

次に、ノードの初期化処理の例について、図６を参照して説明する。ノードの初期化処理は、ビデオ会議システム１００を構成するネットワークに新たなノード（映像／音声処理装置やＭＣＵ)を追加する場合に、追加されたノードのコーデック情報を取得する処理である。オーバレイネットワークによって構成されるビデオ会議システム１００では、あるノードは、分散ハッシュの技術を利用して、扱うことが可能なコーデックの情報を他のノードに保存する。また、あるノードは、自ノードで管理すべきコーデック情報を、他のノードから受信して、コーデック情報を管理する。 Next, an example of node initialization processing will be described with reference to FIG. The node initialization process is a process of acquiring codec information of an added node when a new node (video / audio processing device or MCU) is added to the network constituting the video conference system 100. In the video conference system 100 configured by an overlay network, a certain node stores codec information that can be handled in another node by using a distributed hash technique. Also, a certain node receives codec information to be managed by its own node from another node and manages the codec information.

以下の説明では、ここで、ビデオ会議システム１００に参加するｉ番目の映像／音声処理装置を、第ｉの映像／音声処理装置とし、ｊ番目のＭＣＵを第ｊのＭＣＵとしている。また、ビデオ会議システム１００に参加済みのノードとは、オーバレイネットワークに参加するノードを意味する。このノードは、オーバレイネットワークのコンフィグレーションによって変わる。例えば、隣り合う一つのノードの情報を管理するという設定の場合、隣り合うノードのみである。また、隣り合うn個のノード情報を管理するという設定の場合、隣り合うn個のノードである。 In the following description, the i-th video / audio processing device participating in the video conference system 100 is referred to as the i-th video / audio processing device, and the j-th MCU is referred to as the j-th MCU. Further, the node that has already participated in the video conference system 100 means a node that participates in the overlay network. This node varies depending on the configuration of the overlay network. For example, in the case of setting to manage information of one adjacent node, only the adjacent node is used. Further, in the case of setting to manage n pieces of adjacent node information, there are n pieces of adjacent nodes.

始めに、第ｉの映像／音声処理装置又は第ｊのＭＣＵが新たに参加すると、隣り合うノードに本ノードのＩＤ情報を通知する（ステップＳ１）。ビデオ会議システム１００に参加済みのノードは、第ｉの映像／音声処理装置又は第ｊのＭＣＵのＩＤ情報を取得する。そして、ビデオ会議システム１００に参加済みのノードは、正常にＩＤ情報を取得した旨を示すＯＫレスポンスを第ｉの映像／音声処理装置又は第ｊのＭＣＵに通知する（ステップＳ２）。 First, when the i-th video / audio processing apparatus or the j-th MCU newly participates, the ID information of this node is notified to an adjacent node (step S1). The node that has already participated in the video conference system 100 acquires the ID information of the i-th video / audio processing device or the j-th MCU. Then, the node that has already participated in the video conference system 100 notifies the i-th video / audio processing device or the j-th MCU of an OK response indicating that the ID information has been normally acquired (step S2).

次に、第ｉの映像／音声処理装置が処理可能なコーデック情報又は第ｊのＭＣＵがトランスコーディング可能なコーデック情報を、隣り合うノードに通知する（ステップＳ３）。ビデオ会議システム１００に参加済みのノードは、正常にコーデック情報を取得した旨を示すＯＫレスポンスを、第ｉの映像／音声処理装置又は第ｊのＭＣＵに通知する（ステップＳ４）。 Next, codec information that can be processed by the i-th video / audio processing apparatus or codec information that can be transcoded by the j-th MCU is notified to adjacent nodes (step S3). The node that has already participated in the video conference system 100 notifies the i-th video / audio processing apparatus or the j-th MCU of an OK response indicating that the codec information has been successfully acquired (step S4).

次に、第ｉの映像／音声処理装置又は第ｊのＭＣＵは、隣り合うノードから本ノードで管理すべきコーデック情報を受信する用意ができた旨を通知する（ステップＳ５）。ビデオ会議システム１００に参加済みのノードは、本ノードの受信要求を正常に取得した旨を示すＯＫレスポンスを、第ｉの映像／音声処理装置又は第ｊのＭＣＵに通知する（ステップＳ６）。以降の処理については、後述の図７を参照して説明する。 Next, the i-th video / audio processing apparatus or the j-th MCU notifies from the adjacent node that it is ready to receive codec information to be managed by this node (step S5). The node that has already participated in the video conference system 100 notifies the i-th video / audio processing apparatus or the j-th MCU of an OK response indicating that the reception request of this node has been normally acquired (step S6). The subsequent processing will be described with reference to FIG.

このような処理を経て、例えば、第１のＭＣＵ５０は、記憶部５５に、隣り合うノードに配置される第ｎの映像／音声処理装置１１０が処理可能なコーデックの情報と、第２のＭＣＵ６０がトランスコーディング可能なコーデックの情報を記憶する。 Through such processing, for example, the first MCU 50 stores in the storage unit 55 information on codecs that can be processed by the nth video / audio processing device 110 arranged in an adjacent node, and the second MCU 60 Stores information about codecs that can be transcoded.

次に、多地点で同時に会議を行う場合に、最適なコーデックとＭＣＵを選択する処理の例について、図７を参照して説明する。本例では、特に、第１の映像／音声処理装置１０が行う処理に注目して説明する。 Next, an example of processing for selecting an optimal codec and MCU when a conference is simultaneously held at multiple points will be described with reference to FIG. In this example, the description will be given with particular attention to the processing performed by the first video / audio processing apparatus 10.

始めに、第１の映像／音声処理装置１０は、ビデオ会議システム１００に参加済みのノードに対して、第２の映像／音声処理装置２０が処理可能なコーデック情報を取得する取得要求を行う（ステップＳ１１）。ビデオ会議システム１００に参加済みのノードは、第２の映像／音声処理装置２０が処理可能なコーデック情報を第１の映像／音声処理装置１０に通知すると共に、取得要求に対して正常に通知を行った旨を示すＯＫレスポンスを第１の映像／音声処理装置１０に通知する（ステップＳ１２）。 First, the first video / audio processing device 10 makes an acquisition request for acquiring codec information that can be processed by the second video / audio processing device 20 to a node that has already joined the video conference system 100 ( Step S11). The node that has already participated in the video conference system 100 notifies the first video / audio processing device 10 of codec information that can be processed by the second video / audio processing device 20, and normally notifies the acquisition request. The first video / audio processing apparatus 10 is notified of an OK response indicating that it has been performed (step S12).

次に、第１の映像／音声処理装置１０は、ビデオ会議システム１００に参加済みのノードに対して、第３の映像／音声処理装置３０が処理可能なコーデック情報を取得する取得要求を行う（ステップＳ１３）。ビデオ会議システム１００に参加済みのノードは、第３の映像／音声処理装置３０が処理可能なコーデック情報を第１の映像／音声処理装置１０に通知すると共に、取得要求に対して正常に通知を行った旨を示すＯＫレスポンスを第１の映像／音声処理装置１０に通知する（ステップＳ１４）。 Next, the first video / audio processing apparatus 10 makes an acquisition request for acquiring codec information that can be processed by the third video / audio processing apparatus 30 to a node that has already participated in the video conference system 100 ( Step S13). The node that has already participated in the video conference system 100 notifies the first video / audio processing device 10 of codec information that can be processed by the third video / audio processing device 30 and normally notifies the acquisition request. An OK response indicating that it has been performed is notified to the first video / audio processing apparatus 10 (step S14).

次に、第１の映像／音声処理装置１０は、ビデオ会議システム１００に参加済みのノードに対して、第４の映像／音声処理装置４０が処理可能なコーデック情報を取得する取得要求を行う（ステップＳ１５）。ビデオ会議システム１００に参加済みのノードは、第４の映像／音声処理装置４０が処理可能なコーデック情報を第１の映像／音声処理装置１０に通知すると共に、取得要求に対して正常に通知を行った旨を示すＯＫレスポンスを第１の映像／音声処理装置１０に通知する（ステップＳ１６）。 Next, the first video / audio processing apparatus 10 makes an acquisition request for acquiring codec information that can be processed by the fourth video / audio processing apparatus 40 to a node that has already participated in the video conference system 100 ( Step S15). The node that has already participated in the video conference system 100 notifies the first video / audio processing device 10 of codec information that can be processed by the fourth video / audio processing device 40, and normally notifies the acquisition request. The first video / audio processing apparatus 10 is notified of an OK response indicating that it has been performed (step S16).

こうして、第１の映像／音声処理装置１０は、ステップＳ１２，１４，１６で通知された各映像／音声処理装置が処理可能なコーデックの情報及び多地点接続装置でトランスコーディング可能なコーデックの情報を記憶部１７に記憶させる。なお、ある端末（本例では、第１の映像／音声処理装置１０）が２つ以上先のノードのコーデック情報を取得する場合、隣り合うノード同士でバケツリレーのようにしてコーデック情報を引き渡す。一般的に、自端末で管理しておらず、他の端末が管理している端末（本例では、音声処理装置、ＭＣＵ）の情報（本例では、コーデック情報）を取得する場合、まず、他の端末が管理している端末の位置情報を取得する。そして、自端末は、取得した位置情報から、他の端末が管理している端末を特定し、情報を取得する。 Thus, the first video / audio processing apparatus 10 receives the information on the codec that can be processed by each video / audio processing apparatus and the information on the codec that can be transcoded by the multipoint connection apparatus notified in steps S12, 14, and 16. The data is stored in the storage unit 17. When a certain terminal (in this example, the first video / audio processing device 10) acquires codec information of two or more nodes ahead, the codec information is handed over between adjacent nodes like a bucket relay. In general, when acquiring information (in this example, codec information) of a terminal (in this example, a voice processing device or MCU) that is not managed by the own terminal but managed by another terminal, Acquire terminal location information managed by other terminals. And the own terminal specifies the terminal which the other terminal is managing from the acquired position information, and acquires information.

次に、第１の映像／音声処理装置１０は、取得した第２の映像／音声処理装置２０〜第４の映像／音声処理装置４０が処理可能なコーデック情報に基づいて、ＭＣＵが行うトランスコーディングの負荷が最小となるように最適なコーデックを選択する（ステップＳ１７）。このとき、第１の映像／音声処理装置１０は、第１の映像／音声処理装置１０〜第４の映像／音声処理装置４０で最もよく使われているコーデックを選択する。このため、多数の映像／音声処理装置で使われるコーデックについては、トランスコーディングする必要がなくなり、少数の映像／音声処理装置で使われるコーデックをトランスコーディングするだけでよい。この結果、ビデオ会議システム１００全体の処理量が減少する。 Next, the first video / audio processing device 10 performs transcoding performed by the MCU based on the acquired codec information that can be processed by the second video / audio processing device 20 to the fourth video / audio processing device 40. The optimum codec is selected so as to minimize the load (step S17). At this time, the first video / audio processing device 10 selects the codec most frequently used in the first video / audio processing device 10 to the fourth video / audio processing device 40. For this reason, it is not necessary to transcode codecs used in a large number of video / audio processing apparatuses, and it is only necessary to transcode codecs used in a small number of video / audio processing apparatuses. As a result, the processing amount of the entire video conference system 100 is reduced.

次に、第１の映像／音声処理装置１０は、ビデオ会議システム１００に参加済みのノードに対して、全てのＭＣＵのＩＤ情報と、これらＭＣＵがトランスコーディング可能なコーデック情報を取得する取得要求を行う（ステップＳ１８）。ビデオ会議システム１００に参加済みのノードは、全てのＭＣＵのＩＤ情報と、これらＭＣＵがトランスコーディング可能なコーデック情報を第１の映像／音声処理装置１０に通知すると共に、取得要求に対して正常に通知を行った旨を示すＯＫレスポンスを第１の映像／音声処理装置１０に通知する（ステップＳ１９）。 Next, the first video / audio processing apparatus 10 sends an acquisition request for acquiring ID information of all the MCUs and codec information that can be transcoded by the MCUs to the nodes that have already participated in the video conference system 100. This is performed (step S18). The node that has already participated in the video conference system 100 notifies the first video / audio processing device 10 of the ID information of all the MCUs and the codec information that can be transcoded by these MCUs, and responds normally to the acquisition request. An OK response indicating that notification has been given is sent to the first video / audio processing apparatus 10 (step S19).

第１の映像／音声処理装置１０は、全てのＭＣＵのＩＤ情報を参照して、記憶部１７に記憶された他の映像／音声処理装置で処理可能なコーデックの情報とＭＣＵでトランスコーディング可能なコーデックの情報に基づいて、トランスコーディングさせる多地点接続装置を選択する（ステップＳ２０）。
こうして、トランスコーディングの必要がある場合は、ＭＣＵの機能一覧から最適なＭＣＵを選択し、多地点会議を行うことができる。 The first video / audio processing device 10 refers to the ID information of all the MCUs, and can transcode the information of the codec that can be processed by other video / audio processing devices stored in the storage unit 17 and the MCU. Based on the information of the codec, the multipoint connection device to be transcoded is selected (step S20).
Thus, when transcoding is necessary, an optimum MCU can be selected from the MCU function list and a multipoint conference can be performed.

次に、トランスコーディングの負荷が最小となるように最適なＭＣＵを選択する処理の例について、図８を参照して説明する。本例では、第１の映像／音声処理装置１０がｎ個の映像／音声処理装置と、ｍ個のＭＣＵから最適なＭＣＵを選択する処理について説明する。 Next, an example of processing for selecting an optimum MCU so that the transcoding load is minimized will be described with reference to FIG. In this example, a process in which the first video / audio processing apparatus 10 selects an optimal MCU from n video / audio processing apparatuses and m MCUs will be described.

始めに、第１の映像／音声処理装置１０は、ｎ個の映像／音声処理装置が利用可能なコーデック情報を取得する（ステップＳ２１）。 First, the first video / audio processing device 10 acquires codec information that can be used by n video / audio processing devices (step S21).

次に、第１の映像／音声処理装置１０は、ｎ個の映像／音声処理装置で共通に処理できるコーデックがあるか否かを判別する（ステップＳ２２）。ｎ個の映像／音声処理装置で共通に処理できるコーデックがある場合、トランスコーディングを行う必要がないため、処理を終了する。 Next, the first video / audio processing device 10 determines whether there is a codec that can be processed in common by the n video / audio processing devices (step S22). If there is a codec that can be processed in common by n video / audio processing apparatuses, it is not necessary to perform transcoding, and thus the process ends.

一方、ｎ個の映像／音声処理装置で共通に処理できるコーデックがない場合、ＭＣＵを識別する変数ｊを“１”に初期化する（ステップＳ２３）。以下、第ｊのＭＣＵを、ｊ番目のＭＣＵと称する。第ｊのＭＣＵは、図７の処理で取得した全てのＭＣＵのＩＤ情報から定まる。 On the other hand, if there is no codec that can be processed in common by the n video / audio processing devices, the variable j for identifying the MCU is initialized to “1” (step S23). Hereinafter, the j-th MCU is referred to as a j-th MCU. The j-th MCU is determined from the ID information of all the MCUs acquired in the process of FIG.

次に、第１の映像／音声処理装置１０は、ｊ番目のＭＣＵがトランスコーディング可能なコーデック情報を取得する（ステップＳ２４）。
そして、第１の映像／音声処理装置１０は、映像／音声処理装置を識別する変数ｉを“１”に初期化する（ステップＳ２５）。以下、第ｉのＭＣＵを、ｉ番目のＭＣＵと称する。第ｊのＭＣＵは、図７の処理で取得した全ての映像／音声処理装置のＩＤ情報から定まる。 Next, the first video / audio processing apparatus 10 acquires codec information that can be transcoded by the j-th MCU (step S24).
Then, the first video / audio processing device 10 initializes a variable i for identifying the video / audio processing device to “1” (step S25). Hereinafter, the i-th MCU is referred to as the i-th MCU. The j-th MCU is determined from the ID information of all the video / audio processing devices acquired by the processing of FIG.

次に、第１の映像／音声処理装置１０は、ｊ番目のＭＣＵと、ｉ番目の映像／音声処理装置が共通して処理できるコーデックがあるか否かを判別する（ステップＳ２６）。処理できるコーデックがない場合、ステップＳ３０に処理を移す。 Next, the first video / audio processing device 10 determines whether there is a codec that can be processed in common by the j-th MCU and the i-th video / audio processing device (step S26). If there is no codec that can be processed, the process proceeds to step S30.

一方、処理できるコーデックがある場合、第１の映像／音声処理装置１０は、変数ｉが定数ｎより大きいか否かを判別する（ステップＳ２７）。
変数ｉが定数ｎ以下である場合、第１の映像／音声処理装置１０は、変数ｉを“１”増加し（ステップＳ２８）、ステップＳ２６の処理に移す。 On the other hand, if there is a codec that can be processed, the first video / audio processing apparatus 10 determines whether or not the variable i is greater than a constant n (step S27).
If the variable i is less than or equal to the constant n, the first video / audio processing apparatus 10 increases the variable i by “1” (step S28), and proceeds to the process of step S26.

一方、変数ｉが定数ｎより大きい場合、第１の映像／音声処理装置１０は、ｊ番目のＭＣＵが、ｉ番目の映像／音声処理装置と共通に処理できるコーデックがあることを、記憶部１７１７に記憶する（ステップＳ２９）。 On the other hand, when the variable i is larger than the constant n, the first video / audio processing apparatus 10 indicates that there is a codec that can process the j-th MCU in common with the i-th video / audio processing apparatus. (Step S29).

次に、第１の映像／音声処理装置１０は、変数ｊが定数ｍより大きいか否かを判別する（ステップＳ３０）。
変数ｊが定数ｍ以下である場合、第１の映像／音声処理装置１０は、変数ｊを“１”増加し（ステップＳ３１）、ステップＳ２４の処理に移す。 Next, the first video / audio processing apparatus 10 determines whether or not the variable j is larger than a constant m (step S30).
If the variable j is less than or equal to the constant m, the first video / audio processing apparatus 10 increases the variable j by “1” (step S31), and proceeds to the process of step S24.

一方、変数ｊが定数ｍより大きい場合、第１の映像／音声処理装置１０は、変換処理が可能なＭＣＵから、トランスコーディングが少なくなるＭＣＵを選択し（ステップＳ３２）、処理を終了する。 On the other hand, when the variable j is larger than the constant m, the first video / audio processing apparatus 10 selects an MCU with less transcoding from the MCUs that can perform the conversion process (step S32), and ends the process.

このようにして、映像／音声処理装置間で共通に利用できるコーデックがある場合は、そのコーデックを使用する。また、共通に利用できるコーデックがない場合は、映像／音声処理装置とＭＣＵで共通に利用できるコーデックを調べ、全ての映像／音声処理装置で一つ以上利用できるコーデックがある場合は、そのＭＣＵを使用する。 In this way, when there is a codec that can be used in common between the video / audio processing apparatuses, the codec is used. Also, if there is no codec that can be used in common, the codec that can be used in common by the video / audio processing device and the MCU is checked. use.

以上説明した本実施の形態によれば、オーバレイネットワークを利用して、映像／音声処理装置が利用可能なコーデックとＭＣＵがトランスコーディング可能なコーデック情報といる大量の情報を管理し、取得することが可能となる。会議システム１００では、オーバレイネットワークに参加する全ノードが処理可能なコーデック情報を一元管理するためのサーバが設置されていない。しかしながら、オーバレイネットワークを利用することで、従来のサーバ／クライアントシステムと同様の利便性を確保する点に特徴がある。このため、ビデオ会議システム１００に加わるノードが多くなる場合であっても、最適なコーデックにトランスコーディングすることが可能なＭＣＵを選択できる。この結果、トランスコーディングの負荷が減少し、ビデオ会議システム１００全体で行われる各ノードの処理量が減少するという効果がある。 According to the present embodiment described above, a large amount of information including codec that can be used by the video / audio processing apparatus and codec information that can be transcoded by the MCU can be managed and acquired using the overlay network. It becomes possible. In the conference system 100, a server for centrally managing codec information that can be processed by all nodes participating in the overlay network is not installed. However, there is a feature in that the convenience similar to that of the conventional server / client system is ensured by using the overlay network. For this reason, even when the number of nodes participating in the video conference system 100 increases, it is possible to select an MCU capable of transcoding to an optimal codec. As a result, the transcoding load is reduced, and the processing amount of each node performed in the entire video conference system 100 is reduced.

また、トランスコーディングの負荷が小さくなるようにコーデックを選択できる。例えば、４個の映像／音声処理装置のうち、３個の映像／音声処理装置が共通のコーデック（例えば、ＭＰＥＧ４）であり、１個の映像／音声処理装置が異なるコーデック（例えば、ＭＰＥＧ２）である場合を想定する。このとき、多くの映像／音声処理装置が共通して処理できるコーデック（ＭＰＥＧ４）を用いることで、ＭＰＥＧ２とＭＰＥＧ４をトランスコーディングすることができる。このため、ＭＣＵが行うトランスコーディングの負荷は、ＭＰＥＧ２を処理する映像／音声処理装置に対してだけであるため、ビデオ会議システム１００全体の処理負荷が減少するという効果がある。 Also, the codec can be selected so that the transcoding load is reduced. For example, among four video / audio processing devices, three video / audio processing devices are common codecs (for example, MPEG4), and one video / audio processing device is a different codec (for example, MPEG2). Assume a certain case. At this time, MPEG2 and MPEG4 can be transcoded by using a codec (MPEG4) that can be processed in common by many video / audio processing apparatuses. For this reason, since the load of transcoding performed by the MCU is only for the video / audio processing apparatus that processes MPEG2, there is an effect that the processing load of the entire video conference system 100 is reduced.

また、ＭＣＵがトランスコーディング可能なコーデックを取得した上で、最適なＭＣＵを選択することができる。このため、従来、新しく映像／音声処理装置を追加する度に、コーデック情報やＭＣＵを選択しなければならなかったが、自動化して行える。この結果、ユーザはコーデックやＭＣＵの設定に煩わされなくなるという効果がある。 In addition, an optimal MCU can be selected after obtaining a codec that can be transcoded by the MCU. For this reason, conventionally, codec information and MCU have to be selected every time a new video / audio processing apparatus is added, but this can be done automatically. As a result, there is an effect that the user is not bothered by the codec and MCU settings.

なお、上述した実施の形態例では、双方向に音声を送受信するテレビ会議システムに適用した例として説明したが、双方向通信を用いるシステムであれば、例えば、電話での音声通信等に適用してもよい。 In the above-described embodiment, the example has been described as applied to a video conference system that transmits and receives audio in both directions. However, if the system uses bidirectional communication, for example, it is applied to audio communication over a telephone. May be.

また、上述した実施の形態例における一連の処理は、ハードウェアにより実行することができるが、ソフトウェアにより実行させることもできる。一連の処理をソフトウェアにより実行させる場合には、そのソフトウェアを構成するプログラムを、専用のハードウェアに組み込まれているコンピュータ、または、各種のプログラムをインストールすることで各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータなどに所望のソフトウェアを構成するプログラムをインストールして実行させる。 The series of processes in the above-described embodiment can be executed by hardware, but can also be executed by software. When a series of processing is executed by software, it is possible to execute various functions by installing programs that make up the software into a computer built into dedicated hardware, or by installing various programs. For example, a program constituting desired software is installed and executed in a general-purpose personal computer or the like.

また、上述した実施の形態例の機能を実現するソフトウェアのプログラムコードを記録した記録媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵ等の制御装置）が記録媒体に格納されたプログラムコードを読み出し実行することによっても、達成されることは言うまでもない。 In addition, a recording medium in which a program code of software that realizes the functions of the above-described embodiments is recorded is supplied to the system or apparatus, and a computer (or a control device such as a CPU) of the system or apparatus stores the recording medium in the recording medium. Needless to say, this can also be achieved by reading and executing the program code.

この場合のプログラムコードを供給するための記録媒体としては、例えば、フロッピディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ（Compact Disc）−ＲＯＭ（Read Only Memory）、ＣＤ−Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどを用いることができる。 As a recording medium for supplying the program code in this case, for example, floppy disk, hard disk, optical disk, magneto-optical disk, CD (Compact Disc) -ROM (Read Only Memory), CD-R, magnetic tape, non-volatile A memory card, ROM, or the like can be used.

また、コンピュータが読み出したプログラムコードを実行することにより、上述した実施の形態例の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼動しているＯＳ（Operating System）などが実際の処理の一部又は全部を行い、その処理によって上述した実施の形態例の機能が実現される場合も含まれる。 Further, by executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also an OS (Operating System) running on the computer based on the instruction of the program code. Includes a case where the functions of the above-described embodiment are realized by performing some or all of the actual processing.

本発明の一実施の形態例におけるビデオ会議システムのネットワーク構成例を示す説明図である。It is explanatory drawing which shows the network structural example of the video conference system in the example of 1 embodiment of this invention. 本発明の一実施の形態例におけるビデオ会議システムの内部構成例を示すブロック図である。It is a block diagram which shows the example of an internal structure of the video conference system in the example of 1 embodiment of this invention. 本発明の一実施の形態例における映像／音声処理装置の内部構成例を示すブロック図である。It is a block diagram which shows the example of an internal structure of the video / audio processing apparatus in the example of 1 embodiment of this invention. 本発明の一実施の形態例におけるＭＣＵの内部構成例を示すブロック図である。It is a block diagram which shows the internal structural example of MCU in the example of 1 embodiment of this invention. 本発明の一実施の形態例における多地点におけるビデオ会議システムのネットワークの構成例を示す説明図である。It is explanatory drawing which shows the structural example of the network of the video conference system in the multipoint in one embodiment of this invention. 本発明の一実施の形態例におけるノードを初期化する処理の例を示すシーケンス図である。It is a sequence diagram which shows the example of the process which initializes the node in one embodiment of this invention. 本発明の一実施の形態例における最適なコーデックとＭＣＵを選択する処理の例を示すシーケンス図である。It is a sequence diagram which shows the example of the process which selects the optimal codec and MCU in one embodiment of this invention. 本発明の一実施の形態例におけるトランスコーディングの負荷が最小となるように、最適なＭＣＵを選択する処理の例を示すフローチャートである。It is a flowchart which shows the example of the process which selects optimal MCU so that the load of transcoding in one embodiment of this invention may become the minimum.

Explanation of symbols

１…撮像部、２…マイクロホン、３…表示部、４…スピーカ、５…映像処理部、６…音声処理部、７…ネットワークインタフェース、１０…第１の映像／音声処理装置、１１…アナログ／ディジタル変換部、１２…ディジタル／アナログ変換部、１３…アナログ／ディジタル変換部、１４…ディジタル／アナログ変換部、１５…第１のコーデック管理テーブル、１６…制御部、１７…記憶部、１８…入力部、２０…第２の映像／音声処理装置、２５…第２のコーデック管理テーブル、３０…第３の映像／音声処理装置、４０…第４の映像／音声処理装置、５０…第１のＭＣＵ、５１…映像トランスコード部、５２…音声トランスコード部、５３…制御部、５４…記憶部、５５…ネットワークインタフェース、６０…第２のＭＣＵ、１００…ビデオ会議システム DESCRIPTION OF SYMBOLS 1 ... Imaging part, 2 ... Microphone, 3 ... Display part, 4 ... Speaker, 5 ... Video processing part, 6 ... Audio | voice processing part, 7 ... Network interface, 10 ... 1st video / audio processing apparatus, 11 ... Analog / Digital conversion unit, 12 ... digital / analog conversion unit, 13 ... analog / digital conversion unit, 14 ... digital / analog conversion unit, 15 ... first codec management table, 16 ... control unit, 17 ... storage unit, 18 ... input 20 ... second video / audio processing device 25 ... second codec management table 30 ... third video / audio processing device 40 ... fourth video / audio processing device 50 ... first MCU 51 ... Video transcoding unit, 52 ... Audio transcoding unit, 53 ... Control unit, 54 ... Storage unit, 55 ... Network interface, 60 ... Second MCU, 100 ... Video Oh conference system

Claims

An overlay network is used in which at least one node is allocated on a hash space using a distributed hash table technique, and the nodes include a plurality of data processing devices that code data and the plurality of data processing devices. In a video conference system including a multipoint connection device that transcodes codec data to another codec,
The data processing device includes:
A first storage unit for storing information of codecs that can be processed;
A data processing unit that codec data with the first codec based on the codec information stored in the first storage unit,
The multipoint connection device is:
A second storage unit that stores information of codecs that can be processed by the adjacent nodes;
A transcoding unit configured to transcode data processed by the first codec to a second codec based on information of a codec stored in the second storage unit and processable by the adjacent node; A video conferencing system characterized by that.

The video conference system according to claim 1.
The data processing device includes:
Storing the information of the codec that can be processed by another data processing device and the information of the codec that can be transcoded by the multipoint connection device in the first storage unit,
The multipoint connection device is selected based on codec information that can be processed by the other data processing device and codec information that can be transcoded by the multipoint connection device, stored in the first storage unit. A video conferencing system characterized by that.

The video conference system according to claim 2,
The multipoint connection device is:
The second storage unit stores codec information that can be processed by the data processing device arranged in the adjacent node, and codec information that can be transcoded by another multipoint connection device. Video conferencing system.

An overlay network is used in which at least one node is allocated on a hash space using a distributed hash table technique, and the nodes include a plurality of data processing devices that code data and the plurality of data processing devices. In a video conferencing method including a multipoint connection device that transcodes codec data to another codec,
The data processing device includes:
Stores information about codecs that can be processed,
Based on the stored codec information, code the data with a first codec,
The multipoint connection device is:
Stores information on codecs that can be processed by the adjacent nodes,
A video conferencing method comprising transcoding data processed by the first codec to a second codec based on codec information that can be processed by the adjacent nodes.

An overlay network is used in which at least one node is allocated on a hash space using a distributed hash table technique, and the nodes include a plurality of data processing devices that code data and the plurality of data processing devices. In a program including a multipoint connection device that transcodes coded data to another codec,
The data processing device includes:
A first storage process for storing information on processable codecs;
Based on the stored codec information, codec processing data codec with a first codec,
The multipoint connection device is:
A second storage process for storing codec information that can be processed by the adjacent nodes;
A transcoding process for transcoding data coded by the first codec to a second codec based on codec information stored by the second storage process and processable by the adjacent nodes; A program characterized by