JP2012049748A

JP2012049748A - Sound volume controller, sound volume control method and program

Info

Publication number: JP2012049748A
Application number: JP2010188886A
Authority: JP
Inventors: Sond-Pa Jeong; 松波鄭; Katsuaki Akama; 勝明赤間
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2010-08-25
Filing date: 2010-08-25
Publication date: 2012-03-08
Anticipated expiration: 2030-08-25
Also published as: JP5609431B2

Abstract

【課題】ストリーミング放送、電話会議、ＴＶ会議などの音量を制御する技術に関し、特に複数の対象物に対応する音の音量の制御を適切に行なうことを目的とする。
【解決手段】取得されたデータに対応する音の音量を制御する音量制御装置であって、
前記音に対応する対象物を特定する特定手段と、対象物に対応する音量制御情報を記憶した記憶手段から、前記特定手段において特定された前記対象物に対応する音量制御情報を取得する取得手段と、前記取得手段において取得された前記音量制御情報に応じて、前記音の音量を制御する制御手段とを備える。
【選択図】図３The present invention relates to a technique for controlling the volume of streaming broadcasting, a telephone conference, a TV conference, and the like, and in particular, to appropriately control the volume of sound corresponding to a plurality of objects.
A volume control device for controlling the volume of a sound corresponding to acquired data, comprising:
Acquisition means for acquiring the volume control information corresponding to the object specified by the specifying means from the specifying means for specifying the object corresponding to the sound and the storage means storing the volume control information corresponding to the object. And control means for controlling the volume of the sound according to the volume control information acquired by the acquisition means.
[Selection] Figure 3

Description

本発明は、音量制御装置、音量制御方法およびプログラムに関し、特に、対象物に対応した音の音量の制御を行う装置、方法およびプログラムに関する。 The present invention relates to a volume control device, a volume control method, and a program, and more particularly, to an apparatus, method, and program for controlling the volume of a sound corresponding to an object.

ある拠点に設置されたカメラで撮影された映像を、ネットワークを介した他の拠点に配信するストリーミング放送の技術が実用化されている。また、複数箇所の拠点の端末をネットワークで接続し、音声による電話会議を行う技術も知られている。更に、双方の映像を送受信することで、音声のみならず映像をも用いたTV会議も実用化されている。 Streaming broadcast technology that distributes video captured by a camera installed at a base to another base via a network has been put into practical use. There is also known a technique in which terminals at a plurality of locations are connected via a network to perform a voice conference call. Furthermore, video conferencing using not only audio but also video has been put into practical use by transmitting and receiving both videos.

また、上述の電話会議やＴＶ会議において、接続されている端末が複数あった場合の音量制御に関し、以下の先行技術文献がある。当該先行技術文献には、複数の端末との同時通話が可能な通信装置であって、受信した複数端末の音量の全体平均値を演算し、それぞれの端末の音量が当該平均値になるように調整する手段を備える通信装置が開示されている。 In addition, there are the following prior art documents regarding volume control when there are a plurality of connected terminals in the above-described telephone conference or TV conference. The prior art document is a communication device capable of simultaneous communication with a plurality of terminals, and calculates the overall average value of the volumes of the received plurality of terminals so that the volume of each terminal becomes the average value. A communication device comprising means for adjusting is disclosed.

特開２００４−３２２９６号公報JP 2004-32296 A

上記先行技術文献に開示の技術は、端末ごとの音量を調整するための技術であって、各端末につき発話者が一人であれば問題はないが、一つの端末で複数の人が発話する場合には次のような問題が発生する。上記先行技術文献に開示の技術は、各端末から受信された音量に基づいて全体の平均値を演算している。ここで、声の音量の差が大きい二人の人が一つの端末を使用して発話するような場合、両者の中間的な音量が当該端末の音量として演算に使用されることとなる。そして、演算された全体の平均値に鑑みて決定された当該端末の調整値に基づいて当該端末の音量が調整されることとなるが、当該端末は前述のとおり二人の発話者がいるため、いくら当該端末全体の音量を調整したとしても、声の大きい人の音量は大きいままに、また声の小さい人の音量は小さいままになってしまう。 The technique disclosed in the above prior art document is a technique for adjusting the volume for each terminal, and there is no problem if there is only one speaker for each terminal, but a plurality of people speak on one terminal. The following problems occur. The technique disclosed in the above prior art document calculates the overall average value based on the volume received from each terminal. Here, when two people having a large difference in sound volume speak using one terminal, an intermediate sound volume between them is used as the sound volume of the terminal. Then, the volume of the terminal is adjusted based on the adjustment value of the terminal determined in view of the calculated overall average value, but the terminal has two speakers as described above. Even if the volume of the entire terminal is adjusted, the volume of a person with a loud voice remains high and the volume of a person with a low voice remains low.

なお、端末から受信された音声の音量を単純に平準化することも考えられるが、それでは話している人がいくら抑揚をつけて話しても、全て同じ音量に調整されてしまうため、聞き手側に話し手のニュアンスが伝わらないという問題がある。 Note that it may be possible to simply equalize the volume of the sound received from the terminal, but then no matter how much the speaker is speaking, it will be adjusted to the same volume regardless of how much inflection the speaker is talking to. There is a problem that the nuances of the speaker are not communicated.

本発明は、ストリーミング放送、電話会議、ＴＶ会議等の音量を制御する技術に関し、特に複数の対象物に対応する音の音量の制御を適切に行うことを目的とする。 The present invention relates to a technique for controlling the volume of streaming broadcasting, a telephone conference, a TV conference, and the like, and particularly aims to appropriately control the volume of sound corresponding to a plurality of objects.

上記課題を解決するための装置の一観点は、取得されたデータに対応する音の音量を制御する音量制御装置であって、前記音に対応する対象物を特定する特定手段と、対象物に対応する音量制御情報を記憶した記憶手段から、前記特定手段において特定された前記対象物に対応する音量制御情報を取得する取得手段と、前記取得手段において取得された前記音量制御情報に応じて、前記音の音量を制御する制御手段とを備えている。 One aspect of an apparatus for solving the above-described problem is a volume control apparatus that controls the volume of a sound corresponding to acquired data, the specifying means for specifying an object corresponding to the sound, and an object In accordance with the volume control information acquired in the acquisition means, the acquisition means for acquiring the volume control information corresponding to the object specified in the specifying means, from the storage means that stores the corresponding volume control information, Control means for controlling the volume of the sound.

以上、開示の技術によれば、対象物が音を出力中の期間の音量を当該対象物に設定された所定のレベルに調整するようにしたため、当該期間での調整レベルは一定となる。そのため、当該出力中での抑揚はそのまま維持されつつも、対象物ごとの音量レベルの差を小さくすることができるため、ストリーミング放送、電話会議、ＴＶ会議等の音量を制御する技術に関し、特に対象物に対応した音の音量の制御を適切に行うことが可能となる。 As described above, according to the disclosed technique, the volume of the period during which the object is outputting sound is adjusted to the predetermined level set for the object, and therefore the adjustment level in the period is constant. Therefore, since the inflection in the output is maintained as it is, the difference in the volume level between the objects can be reduced, and thus the technique for controlling the volume of streaming broadcast, telephone conference, TV conference, etc. It is possible to appropriately control the volume of the sound corresponding to the object.

顔相の比較と唇動作の大きさ比較による音量関係を示す説明図である。It is explanatory drawing which shows the volume relationship by the comparison of a face, and the magnitude | size comparison of a lip motion. 本発明にて用いられる通信端末のハードウェア構成図の一例である。It is an example of the hardware block diagram of the communication terminal used by this invention. 本発明にて用いられる通信端末のブロック構成図の一例である。It is an example of the block block diagram of the communication terminal used by this invention. 本発明の一実施形態における特定話者の音量増加値設定のフローチャートである。It is a flowchart of the volume increase value setting of the specific speaker in one Embodiment of this invention. 本発明による話者ごとの会議音量調整値テーブル（実施例１）である。It is a conference volume adjustment value table (Example 1) for every speaker by this invention. 本発明による会議録音音声の音量増加レベル値設定の説明図である。It is explanatory drawing of the volume increase level value setting of the meeting audio | voice by this invention. 図５に基づく会議中の会議音量調整方法の手順を示すフローチャート（実施例１）である。6 is a flowchart (first embodiment) illustrating a procedure of a conference volume adjustment method during a conference based on FIG. 5. 本発明による通信端末の各機能ブロック間の信号の流れを示すシーケンス図（その１）である。It is a sequence diagram (the 1) which shows the flow of the signal between each functional block of the communication terminal by this invention. 本発明による通信端末の各機能ブロック間の信号の流れを示すシーケンス図（その２）である。It is a sequence diagram (the 2) which shows the flow of the signal between each functional block of the communication terminal by this invention. 本発明による通信端末の各機能ブロック間の信号の流れを示すシーケンス図（その３）である。It is a sequence diagram (the 3) which shows the flow of the signal between each functional block of the communication terminal by this invention. 本発明による話者ごとの会議音量調整値テーブル（実施例２）である。It is a meeting volume adjustment value table (Example 2) for every speaker by this invention. 図１１に基づく会議中の会議音声調整方法の手順を示すフローチャート（実施例２）である。12 is a flowchart (second embodiment) illustrating a procedure of a conference audio adjustment method during a conference based on FIG.

以下、本発明の実施の形態について、本発明の技術を適用した音量制御装置の例として、カメラつき携帯電話等の通信端末に本発明の技術を適用した例を説明する。また、本実施の形態では、通信端末のカメラで撮影された映像を用いたテレビ会議において、対象物ごとに設定された音量調整値に基づいて、それぞれの対象物ごとの音量を調整する例を説明する。なお、この場合の対象物とは、電話会議で発話する話者を指すものとする。 Hereinafter, an embodiment in which the technology of the present invention is applied to a communication terminal such as a camera-equipped mobile phone will be described as an example of a volume control device to which the technology of the present invention is applied. In the present embodiment, in a video conference using video captured by the camera of the communication terminal, an example of adjusting the volume for each object based on the volume adjustment value set for each object. explain. Note that the object in this case refers to a speaker who speaks in a conference call.

図１は、顔相の比較と唇動作の大きさ比較による音量関係を示す説明図であり、Ａさん１とＢさん２の顔相と唇動作の大きさを対比して示している。
（ａ）顔相と唇動作の大きさ
Ａさん１は丸顔で唇動作の幅が大であり、Ｂさん２は面長顔で唇動作の幅が小であることを示す。唇動作の幅を認識するためは、通信端末は発話をしている人の唇の動作を監視する。
（ｂ）ＡさんとＢさんの音量
Ｂさんの声はＡさんの声に比べて音量が小さいものとする。
（ｃ）Ｂさんの音量増加調整
本実施の形態では、小さな声のＢさんが話している間だけ音声データ全体の音量をＢさん用の設定値分（＋ＡｄＢ）大きくすることで、Ｂさんの声を聞き取りやすくする例を説明する。なお、以下に説明する実施の形態とは逆に、声の大きなＡさんが話している間だけ音声データ全体の音量をＡさん用の設定値分(例えば△ＢｄＢ)小さくすることで同様の効果を奏することも考えられる。 FIG. 1 is an explanatory diagram showing the volume relationship between the comparison of the facial and the magnitude of the lip motion, and shows the facial and lip motion magnitudes of A1 and B2.
(A) Face and size of lip movement A-1 is a round face and the width of the lip movement is large, and Mr. B2 is a face-long face and the width of the lip movement is small. In order to recognize the width of the lip movement, the communication terminal monitors the movement of the lip of the person who is speaking.
(B) Volume of Mr. A and Mr. B It is assumed that Mr. B's voice is lower in volume than Mr. A's voice.
(C) Adjustment of Mr. B's volume In this embodiment, Mr. B's volume is increased by the set value for Mr. B (+ AdB) only while Mr. B is speaking a small voice. An example of making it easier to hear a voice will be described. Contrary to the embodiment described below, the same effect can be obtained by reducing the volume of the entire voice data by the set value for Mr. A (for example, ΔBdB) only while Mr. A with a loud voice is speaking. It is also possible to play.

図２は、本発明にて用いられる通信端末のハードウェア構成図の一例である。通信端末は、会議参加者の声を集音して、公衆ネットワーク等を介して遠隔地に飛ばす機能を備えるとともに、表示部５と、発話者の顔相と唇動作幅を検知するカメラ部１０を有している。 FIG. 2 is an example of a hardware configuration diagram of a communication terminal used in the present invention. The communication terminal has a function of collecting voices of conference participants and sending them to a remote location via a public network or the like, and also includes a display unit 5 and a camera unit 10 that detects the face and lip movement width of the speaker. have.

無線通信部４は、ＲＦ(Radio Frequency)４１とＢＢ（Broadband）４２から成り、音声入出力部８からの音声通信情報とカメラ部１０からの画像情報をプロセッサ１１の制御のもとアンテナ３を経由して無線通信網へ接続し、遠隔地に設置されている通信端末との間で無線通信を行う。 The wireless communication unit 4 includes an RF (Radio Frequency) 41 and a BB (Broadband) 42, and the audio communication information from the audio input / output unit 8 and the image information from the camera unit 10 are transmitted to the antenna 3 under the control of the processor 11. It connects to a wireless communication network via and communicates with a communication terminal installed in a remote place.

表示部５と入力部９は、通信端末の操作面に配備されている画像表示部及び入力操作を行なうキー列で構成されている。 The display unit 5 and the input unit 9 are composed of an image display unit arranged on the operation surface of the communication terminal and a key string for performing an input operation.

音声入出力部８は、会議参加者の声を集音するマイク６と会議参加者へ発話者の声を伝えるスピーカ７へ音声を入出力する。 The voice input / output unit 8 inputs and outputs voice to and from the microphone 6 that collects the voice of the conference participant and the speaker 7 that transmits the voice of the speaker to the conference participant.

カメラ部１０は、会議者の顔とか資料や画像などのコンテンツ、及び黒板に記載した内容をプロセッサ１１の制御のもと撮像する。 The camera unit 10 captures content such as a conference person's face, materials and images, and the contents described on the blackboard under the control of the processor 11.

記憶部１２は、ＲＡＭ１２１およびＲＯＭ１２２を有しており、プロセッサ１１の制御のもと通信端末の動作において使用される。 The storage unit 12 includes a RAM 121 and a ROM 122, and is used in the operation of the communication terminal under the control of the processor 11.

図３は、本発明にて用いられる通信端末の機能ブロック構成図の一例である。通信端末としては、図２で説明したのと同様に、会議参加者の声を集音して、公衆ネットワーク等を介して遠隔地に飛ばす機能を備えるとともに、プロジェクタを想定した表示制御部２１と、発話者の顔相と唇動作幅を検知するカメラ制御部２２を有している。 FIG. 3 is an example of a functional block configuration diagram of a communication terminal used in the present invention. As described in FIG. 2, the communication terminal has a function of collecting voices of conference participants and sending them to a remote location via a public network, and a display control unit 21 assuming a projector; The camera control unit 22 detects the face and lip movement width of the speaker.

更に、図２のプロセッサ１１がプログラム記憶部１２１に格納されているプログラムを実行することにより、以下に説明する対象物特定部１３、音量制御情報記憶部、音量増加レベル取得部１５、会議音声調整部１７、通話記録部１９、及び各制御部として動作する。なお、上記プログラムは、ＣＤ−ＲＯＭ等の記録媒体に記録することで頒布することが可能となる。更に、ネットワーク上のサーバ装置に上記プログラムを格納しておき、端末装置がダウンロードしてインストールすることも考えられる。 2 executes the program stored in the program storage unit 121, the object specifying unit 13, volume control information storage unit, volume increase level acquisition unit 15, conference audio adjustment described below The unit 17, the call recording unit 19, and each control unit operate. The program can be distributed by being recorded on a recording medium such as a CD-ROM. Furthermore, it is also conceivable that the program is stored in a server device on the network, and the terminal device is downloaded and installed.

対象物特定部１３は、会議音声調整部通信端末に対象物特定機能を追加したものであり、具体的には発話者を識別して特定する。 The object specifying unit 13 is obtained by adding an object specifying function to the conference voice adjustment unit communication terminal, and specifically identifies and specifies the speaker.

音量制御情報記憶部１４は、対象物に対応する音量制御情報を記憶した記憶手段であり、会議音量調整値テーブルに対応した音量制御情報が記憶されている。 The volume control information storage unit 14 is a storage unit that stores volume control information corresponding to an object, and stores volume control information corresponding to a conference volume adjustment value table.

音量増減レベル取得部１５は、会議音量調整値テーブルの形で格納されている音量制御情報記憶部１４から音量制御情報を取得する。 The volume increase / decrease level acquisition unit 15 acquires volume control information from the volume control information storage unit 14 stored in the form of a conference volume adjustment value table.

会議音声調整部１７は、通信端末に会議音声調整機能を追加したものであり、通信端末の電源を入れることにより、会議音声調整部１７が起動され、カメラ制御部２２、マイク制御部１８および通話記録部１９と連携して画像分析により特定話者の顔データを認証して登録すると共に、発話者の唇動作幅を測定して登録する。さらに、特定話者の音声録音の分析により音量増加レベル値を決定して登録する。具体的には、大きな声の人と小さな声の人が同一の通信端末を使って発話をした場合でも、発話者の顔相と唇動作幅を検出することで、発話者を特定し、登録済みの音量増加レベル値を読み出して調整を行なう。 The conference audio adjustment unit 17 is obtained by adding a conference audio adjustment function to the communication terminal. When the communication terminal is turned on, the conference audio adjustment unit 17 is activated, and the camera control unit 22, the microphone control unit 18, and the call The face data of the specific speaker is authenticated and registered by image analysis in cooperation with the recording unit 19, and the lip movement width of the speaker is measured and registered. Further, the volume increase level value is determined and registered by analyzing the voice recording of the specific speaker. Specifically, even when a person with a loud voice and a person with a low voice speak using the same communication terminal, the speaker is identified and registered by detecting the face and lip movement width of the speaker. Read out the volume increase level value that has already been adjusted.

スピーカ制御部１６は、会議音声調整部１７からの指示により小さな声の人の出力音量を、会議音量調整値テーブルに基づく音量制御情報記憶部１４に登録済みの音量増加レベル分だけ大きくして出力する。 The speaker control unit 16 increases the output volume of a person with a low voice according to an instruction from the conference audio adjustment unit 17 by an amount corresponding to the volume increase level registered in the volume control information storage unit 14 based on the conference volume adjustment value table. To do.

入力制御部２０と表示制御部２１は、通信端末の操作面に配備されている入出力キー及び画像表示部の操作を制御する。 The input control unit 20 and the display control unit 21 control input / output keys and operation of the image display unit provided on the operation surface of the communication terminal.

図４は、本発明の一実施形態における特定話者の音量増加値設定のフローチャートである。以下に特定話者の会議音量調整値の設定手順について説明する。 FIG. 4 is a flowchart for setting a volume increase value for a specific speaker in an embodiment of the present invention. The procedure for setting the conference volume adjustment value for a specific speaker will be described below.

Ｓ１．会議音声調整部１７は、特定の会議参加者が発話中における音声の音量を調整するかを判断する。当該判断は、利用者が通信端末に対して所定の操作(例えばメニューから当該処理を実行する旨の指示)を行った場合にＹＥＳと判断される。 S1. The conference audio adjustment unit 17 determines whether to adjust the volume of audio while a specific conference participant is speaking. This determination is determined as YES when the user performs a predetermined operation on the communication terminal (for example, an instruction to execute the process from the menu).

Ｓ２．前述のＳ１にて音量調整値を設定すると判断されたときに、会議音声調整部１７は、カメラ画像より特定話者の人数を判定する。 S2. When it is determined in S1 described above that the volume adjustment value is set, the conference audio adjustment unit 17 determines the number of specific speakers from the camera image.

Ｓ３．会議音声調整部１７は、人数分の特定話者の発話音声調整値が登録されたかを判断する。 S3. The conference voice adjustment unit 17 determines whether or not the utterance voice adjustment values of the specific speakers for the number of persons are registered.

Ｓ４．前述のＳ３にて人数分の登録が終了していない場合、会議音声調整部１７は、会議における発話者の人数を判定する。 S4. When the registration for the number of persons is not completed in S3 described above, the conference audio adjustment unit 17 determines the number of speakers in the conference.

Ｓ５．会議音声調整部１７は、発話者が一人の場合、その発話者が登録済みであるか否かを判定する。 S5. When there is only one speaker, the conference audio adjustment unit 17 determines whether or not the speaker has been registered.

Ｓ６．前述のＳ５にて発話者の登録がなされていない場合、会議音声調整部１７は、特定話者の顔認識データを会議音量調整値テーブルに登録する。 S6. When the speaker is not registered in S5 described above, the conference voice adjustment unit 17 registers the face recognition data of the specific speaker in the conference volume adjustment value table.

Ｓ７．会議音声調整部１７は、特定話者の発話を録音し、その音量を測定する。 S7. The conference audio adjustment unit 17 records the utterance of the specific speaker and measures the volume.

Ｓ８．会議音声調整部１７は、特定話者の唇動作幅を抽出し、抽出された唇動作幅を特定話者の顔認識データに対応付けて会議音量調整値テーブルに登録する。 S8. The conference voice adjustment unit 17 extracts the lip motion width of the specific speaker, and registers the extracted lip motion width in the conference volume adjustment value table in association with the face recognition data of the specific speaker.

Ｓ９．会議音声調整部１７は、声の大きさが閾値以上であるか否かを判断する。 S9. The conference audio adjustment unit 17 determines whether or not the loudness of the voice is greater than or equal to a threshold value.

Ｓ１０．前述のＳ９にて声の大きさが閾値未満の場合、会議音声調整部１７は、音量の増加レベルを決定する。 S10. When the loudness of the voice is less than the threshold value in S9 described above, the conference audio adjustment unit 17 determines the volume increase level.

Ｓ１１．前述のＳ９にて声の大きさが閾値以上の場合、会議音声調整部１７は、音量の増加レベルをゼロにする。 S11. When the voice volume is equal to or greater than the threshold value in S9 described above, the conference voice adjustment unit 17 sets the volume increase level to zero.

Ｓ１２．会議音声調整部１７は、音量制御情報として、特定話者の発話音声の調整値を、特定話者の顔認識データに対応付けて会議音量調整値テーブルに設定する。音量制御情報としては、調整値を設定する以外に、音量自体を設定することも考えられる。この場合には、取得されたデータに対応する音の音量をこの設定された音量に設定するように制御こととなる。 S12. The conference voice adjustment unit 17 sets, as the volume control information, the adjustment value of the utterance voice of the specific speaker in the conference volume adjustment value table in association with the face recognition data of the specific speaker. As volume control information, in addition to setting an adjustment value, setting the volume itself may be considered. In this case, control is performed so that the volume of the sound corresponding to the acquired data is set to the set volume.

図５は、本発明による話者ごとの会議音量調整値テーブル（実施例１）である。話者Ａの顔認識データは、丸顔又は面長顔に対応したデータ値としてＡが設定される。また、話者Ａの唇動作幅は、測定の結果４ｃｍと測定されている。以上の顔認識データと唇動作幅から話者Ｂの声の大きさが判明する。 FIG. 5 is a conference volume adjustment value table (first embodiment) for each speaker according to the present invention. In the face recognition data of the speaker A, A is set as a data value corresponding to the round face or the long face. Further, the lip movement width of the speaker A is measured as 4 cm as a result of the measurement. The loudness of the voice of the speaker B is determined from the face recognition data and the lip movement width.

話者Ａの音量増加レベル値の決定は、全ての会議参加者の声の大きさと比較したところ、本例では、話者Ａの声の大きさは充分に大きく、音量を増加する必要がないとして音量増加レベルは０ｄＢとしている。 The determination of the volume increase level value of the speaker A is compared with the volume of the voices of all the conference participants. In this example, the volume of the speaker A is sufficiently large, and it is not necessary to increase the volume. The volume increase level is 0 dB.

話者Ｂの顔認識データは、丸顔又は面長顔に対応したデータ値としてＢが設定される。また、話者Ｂの唇動作幅は、測定の結果１ｃｍと測定されている。以上の顔認識データと唇動作幅から話者Ｂの声の大きさが判明する。 In the face recognition data of the speaker B, B is set as a data value corresponding to a round face or a long face. Further, the lip movement width of the speaker B is measured as 1 cm as a result of the measurement. The loudness of the voice of the speaker B is determined from the face recognition data and the lip movement width.

話者Ｂの音量増加レベル値の決定は、全ての会議参加者の声の大きさと比較したところ、本例では、話者Ｂの声の大きさは小さく、かなりの音量を増加する必要があるとして音量増加レベルは＋２０ｄＢとしている。 The determination of the volume increase level value of the speaker B is compared with the volume of the voices of all the conference participants. In this example, the volume of the speaker B is small and the volume needs to be increased considerably. The volume increase level is +20 dB.

図６は、本発明による会議録音音声の音量増加レベル値設定の説明図である。図６に示すＡさんの唇動作幅の波形図とＢさんの唇動作幅の波形図は、図５で示した話者唇動作幅に基づく時間経過による唇動作幅の波形図である。 FIG. 6 is an explanatory diagram of the volume increase level value setting of the conference recording voice according to the present invention. The waveform diagram of the lip motion width of Mr. A and the waveform diagram of the lip motion width of Mr. B shown in FIG. 6 are waveform diagrams of the lip motion width over time based on the speaker lip motion width shown in FIG.

発話録音音声では、唇動作幅より、Ｂさんだけが話をしている部分の発話録音音声を切り出して音量を測定する。次に、Ｂさんの音量を基準音量と比較して、非常に小さい（所定閾値未満）ため、その不足音量を音量増加レベルとしてプラス２０ｄＢを付加する。図６は音量増加レベルを付加する前の波形図であり、音量増加レベルを付加すると、前述した図１の（ｃ）Ｂさんの音量増加調整の波形図のように、Ｂさんの声だけに音量が増加される。なお、基準音量は、個々の発話音量が明瞭に聞き取れる標準の音量である。 In the utterance recording voice, the utterance recording voice of the part where only Mr. B is speaking is cut out from the lip movement width and the volume is measured. Next, since the volume of Mr. B is very small compared to the reference volume (less than a predetermined threshold), plus 20 dB is added with the insufficient volume as the volume increase level. FIG. 6 is a waveform diagram before the volume increase level is added. When the volume increase level is added, only the voice of Mr. B is obtained as shown in the waveform diagram of the volume increase adjustment of Mr. B in FIG. The volume is increased. The reference volume is a standard volume at which individual utterance volumes can be heard clearly.

図７は、図５に基づく会議中の会議音量調整方法の手順を示すフローチャート（実施例１）である。このフローチャートは、前述の図４にて示した特定話者の音量増加値設定のフローチャートに続く、音量増加値設定後の会議音量調整方法のフローチャートを示す。 FIG. 7 is a flowchart (Example 1) which shows the procedure of the meeting volume adjustment method during the meeting based on FIG. This flowchart is a flowchart of the conference volume adjustment method after the volume increase value is set, following the flowchart for setting the volume increase value for the specific speaker shown in FIG.

以下に図３を参照しつつ会議中の会議音量調整方法の手順について説明する。 The procedure of the conference volume adjustment method during the conference will be described below with reference to FIG.

Ｓ１３．会議音声調整部１７は、映像会議中か否かを判断する。 S13. The conference audio adjustment unit 17 determines whether the video conference is in progress.

Ｓ１４．対象物特定部１３は、カメラモニタの画像から発話者の唇動作幅を検出したか否かを判定し、検出されたと判定された場合には次のステップへ移行する。 S14. The object specifying unit 13 determines whether or not the lip movement width of the speaker has been detected from the image on the camera monitor, and proceeds to the next step if it is determined that it has been detected.

Ｓ１５．対象物特定部１３は、発話者の顔相（丸顔とか面長顔等）を認識する。 S15. The object specifying unit 13 recognizes the face of the speaker (such as a round face or a long face).

Ｓ１６．前述のＳ１４にて唇動作の検出ができないとき、会議音声調整部１７は、発話者がいないと判断して音量レベルを初期値に戻す。 S16. When the lip motion cannot be detected in S14 described above, the conference audio adjustment unit 17 determines that there is no speaker and returns the volume level to the initial value.

Ｓ１７．対象物特定部１３は、複数発話者が同時に発話をしているか否かを判断する。 S17. The object specifying unit 13 determines whether a plurality of speakers are speaking at the same time.

Ｓ１８．前述のＳ１７にて複数発話者が同時に発話しているのを検出すると、音量増減レベル取得部１７は、Ｓ１５で認識された顔相が顔認識データとして記録されているレコードを、会議音量調整値テーブルに基づく音量制御情報記憶部１４から抽出する。そして、当該レコードの音量増加レベルに設定されている値を抽出する。登録済みの複数発話者全員分の音量増加レベルが取得できたら、各音量増加レベルの内、最大増加レベル分、音量レベルを増加させる。 S18. When it is detected in S17 that a plurality of speakers are speaking at the same time, the volume increase / decrease level acquisition unit 17 displays a record in which the facial recognition recognized in S15 is recorded as face recognition data. Extracted from the volume control information storage unit 14 based on the table. Then, the value set for the volume increase level of the record is extracted. When the volume increase level for all registered multiple speakers is obtained, the volume level is increased by the maximum increase level among the volume increase levels.

Ｓ１９．前述のＳ１７にて発話しているのが一人であるのを検出すると、音量増減レベル取得部１５は、Ｓ１５で認識された顔相が顔認識データとして記録されているレコードを、会議音量調整値テーブルに基づく音量制御情報記憶部１４から抽出する。そして、当該レコードの音量増加レベルに設定されている値を抽出する。その後、抽出された音量増加レベル分、音量レベルを増加させる。 S19. When it is detected that one person is speaking in S17 described above, the volume increase / decrease level acquisition unit 15 displays a record in which the face recognized in S15 is recorded as face recognition data, as a conference volume adjustment value. Extracted from the volume control information storage unit 14 based on the table. Then, the value set for the volume increase level of the record is extracted. Thereafter, the volume level is increased by the extracted volume increase level.

図８は、本発明による通信端末の各機能ブロック間の信号の流れを示すシーケンス図（その１）である。以下に会議音量調整値の設定手順について説明する。 FIG. 8 is a sequence diagram (part 1) showing a signal flow between the functional blocks of the communication terminal according to the present invention. The procedure for setting the conference volume adjustment value will be described below.

Ｓ２０．会議音声調整部１７が起動され、小さな声の音量を増加するための増加値設定が開始されると、会議音声調整部１７はカメラ制御部２２に対してカメラ画像撮影を要求する。 S20. When the conference audio adjustment unit 17 is activated and an increase value setting for increasing the volume of a small voice is started, the conference audio adjustment unit 17 requests the camera control unit 22 to take a camera image.

Ｓ２１．カメラ制御部２２は、カメラ部１０に対してカメラ画像撮影の開始を指示し、そのカメラ画像を会議音量調整部１７へ通知する。 S21. The camera control unit 22 instructs the camera unit 10 to start capturing a camera image, and notifies the conference volume adjustment unit 17 of the camera image.

Ｓ２２．会議音量調整部１７は、画像を分析して会議参加者の人数を判定する。 S22. The conference volume adjustment unit 17 analyzes the image and determines the number of conference participants.

Ｓ２３．会議音声調整部１７は、画像モニタを開始するためにカメラ制御部２２に対してカメラモニタ起動を指示する。 S23. The conference audio adjustment unit 17 instructs the camera control unit 22 to start the camera monitor in order to start the image monitor.

Ｓ２４．カメラ制御部２２は、カメラモニタ画像を会議音量調整部１７へ通知する。 S24. The camera control unit 22 notifies the conference volume adjustment unit 17 of the camera monitor image.

Ｓ２５．会議音量調整部１７は、カメラモニタ画像に映っている発話者一人の唇動作を検出してその発話者の顔相を認証する。 S25. The conference volume adjustment unit 17 detects the lip motion of one speaker shown in the camera monitor image and authenticates the face of the speaker.

Ｓ２６．会議音量調整部１７は、特定発話者の顔相がまだ登録されていない場合は、認証した特定話者の顔データを登録する。 S26. When the face of the specific speaker has not been registered yet, the conference volume adjustment unit 17 registers the face data of the authenticated specific speaker.

図９は、本発明による通信端末の各機能ブロック間の信号の流れを示すシーケンス図（その２）である。以下に会議音量調整値の設定手順について説明する。 FIG. 9 is a sequence diagram (part 2) illustrating a signal flow between the functional blocks of the communication terminal according to the present invention. The procedure for setting the conference volume adjustment value will be described below.

Ｓ２７．会議音声調整部１７は、通話記録部１９に対して録音開始を要求するとともに、マイク制御部１８に対してマイクＯＮを指示する。 S27. The conference audio adjustment unit 17 requests the call recording unit 19 to start recording and instructs the microphone control unit 18 to turn on the microphone.

Ｓ２８．通話記録部１９は、音声録音を開始する。 S28. The call recording unit 19 starts voice recording.

Ｓ２９．会議音声調整部１７は、通話記録部１９に対して録音開始から例えば１０秒間待って録音停止を要求するとともに、マイク制御部１８に対してマイクＯＦＦを指示する。 S29. The conference voice adjustment unit 17 requests the call recording unit 19 to stop recording after waiting for 10 seconds from the start of recording, for example, and instructs the microphone control unit 18 to turn off the microphone.

Ｓ３０．通話記録部１９は、特定話者の録音された音声を会議音声調整部１７へ通知する。 S30. The call recording unit 19 notifies the conference audio adjustment unit 17 of the audio recorded by the specific speaker.

Ｓ３１．会議音声調整部１７は、通話記録部１９より受け取った特定話者の録音された声の大きさを分析する。 S31. The conference voice adjustment unit 17 analyzes the volume of the recorded voice of the specific speaker received from the call recording unit 19.

Ｓ３２．会議音声調整部１７は、特定話者の音量レベルが閾値未満であることが判明すると、特定話者の音量増加レベルを決定する。なお、特定話者の音量レベルが閾値以上であることが判明すると、特定話者の音量は増加しない。 S32. When it is determined that the volume level of the specific speaker is less than the threshold, the conference audio adjustment unit 17 determines the volume increase level of the specific speaker. If the volume level of the specific speaker is found to be equal to or higher than the threshold value, the volume of the specific speaker does not increase.

Ｓ３３．会議音声調整部１７は、音量制御情報として、特定話者の発話音声の調整値を特定話者の顔相認識データに対応付けて会議音量調整値テーブルに設定する。 S33. The conference voice adjustment unit 17 sets, as volume control information, the adjustment value of the utterance voice of the specific speaker in the conference volume adjustment value table in association with the facial recognition data of the specific speaker.

Ｓ３４．音量制御情報記憶部１４は、会議音量調整値テーブルに基づく話者ごとの音量制御情報を格納する。 S34. The volume control information storage unit 14 stores volume control information for each speaker based on the conference volume adjustment value table.

図１０は、本発明による通信端末の各機能ブロック間の信号の流れを示すシーケンス図（その３）である。以下に会議音量調整値の設定手順について説明する。 FIG. 10 is a sequence diagram (No. 3) showing a signal flow between the functional blocks of the communication terminal according to the present invention. The procedure for setting the conference volume adjustment value will be described below.

Ｓ３５．会議音声調整部１７は、映像会議中を確認し、カメラ制御部２２に対してカメラ画像の撮影を要求する。 S35. The conference audio adjustment unit 17 confirms that the video conference is in progress and requests the camera control unit 22 to capture a camera image.

Ｓ３６．カメラ制御部２２は、カメラ画像の撮影を開始し、カメラ画像を対象物特定部１３へ通知する。 S36. The camera control unit 22 starts capturing a camera image and notifies the object specifying unit 13 of the camera image.

Ｓ３７．対象物特定部１３は、カメラ画像から発話者の唇動作幅を検出する。 S37. The object specifying unit 13 detects the lip movement width of the speaker from the camera image.

Ｓ３８．対象物特定部１３は、発話者の顔相（丸顔とか面長顔等）を認識する。 S38. The object specifying unit 13 recognizes the face of the speaker (such as a round face or a long face).

Ｓ３９．対象物特定部１３は、前述のＳ３７．Ｓ３８により特定話者を特定し、音量増減レベル取得部１５に対して特定話者の記録データの検索依頼を行なう。 S39. The object specifying unit 13 performs the above-described S37. In S38, the specific speaker is specified, and the volume increase / decrease level acquisition unit 15 is requested to search the recorded data of the specific speaker.

Ｓ４０．音量増減レベル取得部１５は、全発話者の音量制御情報が記録されている音量制御情報記憶部１４の会議音量調整値テーブルから該当する音量増加レベルを抽出する。 S40. The volume increase / decrease level acquisition unit 15 extracts a corresponding volume increase level from the conference volume adjustment value table of the volume control information storage unit 14 in which the volume control information of all speakers is recorded.

Ｓ４１．音量制御情報記憶部１４は、該当する増加レベル値を音量増減レベル取得部１５に通知するとともに、会議音声調整部１７へ通知する。 S41. The volume control information storage unit 14 notifies the volume increase / decrease level acquisition unit 15 of the corresponding increase level value and also notifies the conference audio adjustment unit 17.

Ｓ４２．会議音声調整部１７は、スピーカ制御部に対して抽出された音量増加レベル分、音量レベルを増加させる。 S42. The conference audio adjustment unit 17 increases the volume level by the volume increase level extracted for the speaker control unit.

ここまで説明した実施の形態では、発話者を特定するための技術として、唇の動作検出の技術を用いた例を説明したが、他の動作を検出して発話者を特定するようにしてもよい。例えば、唇以外の部位(あご、頬等)の動作や、頭の動作等を検出するようにしてもよい。 In the embodiments described so far, the example using the lip motion detection technology has been described as the technology for specifying the speaker. However, the speaker may be specified by detecting other motions. Good. For example, the movement of a part other than the lips (chin, cheeks, etc.), the movement of the head, etc. may be detected.

また、発話者と音量増加レベルの対応付けを行うに際し、発話者を一意に特定するための技術として、顔認識の技術を用いた例を説明したが、顔以外の部位を用いて特定されてもよい。また、本発明を実施するにあたり、必ずしも画像解析の技術を用いる必要はない。以下に、他の実施例を説明する。 In addition, as an example of using a face recognition technique as a technique for uniquely identifying a speaker when associating a speaker with a volume increase level, an example using a part other than the face has been described. Also good. In implementing the present invention, it is not always necessary to use an image analysis technique. Other embodiments will be described below.

画像を用いない音声のみの電話会議を行う場合の例として、携帯電話等の通信端末を用いて発話者の音声を取得し、取得された音声を、公衆回線等のネットワークを介して他の通信端末に送信する場合を想定する。ハードウェア構成、及び機能ブロック図は、図２及び３と同様であるが、図２のカメラ部１０及び図３のカメラ制御部２２は不要である。 As an example of a voice-only telephone conference that does not use images, the voice of the speaker is acquired using a communication terminal such as a mobile phone, and the acquired voice is transmitted to another communication via a network such as a public line. Assume that the message is sent to the terminal. The hardware configuration and the functional block diagram are the same as those in FIGS. 2 and 3, but the camera unit 10 in FIG. 2 and the camera control unit 22 in FIG. 3 are unnecessary.

本実施の形態では、予め各発話者の声紋データと音量増加レベルを会議音量調整値テーブルに記録しておくものとする。ここで言う声紋データとは、人物の声を分析し、特徴を抽出した値のことを言うものとする。 In this embodiment, it is assumed that the voice print data and volume increase level of each speaker are recorded in advance in the conference volume adjustment value table. The voiceprint data referred to here is a value obtained by analyzing a person's voice and extracting features.

図１１は、本発明による話者ごとの会議音量調整値テーブル（実施例２）である。当該テーブルには、話者Ａ及び話者Ｂの話者識別子に対応付けて、それぞれの声紋データが設定される。 FIG. 11 is a conference volume adjustment value table (second embodiment) for each speaker according to the present invention. In the table, each voiceprint data is set in association with the speaker identifiers of the speaker A and the speaker B.

図１２は、図１１に基づく会議中の会議音量調整方法の手順を示すフローチャート（実施例２）である。以下に図３を参照しつつ会議音量調整値の設定手順について説明する。 FIG. 12 is a flowchart (Example 2) which shows the procedure of the meeting volume adjustment method during the meeting based on FIG. The procedure for setting the conference volume adjustment value will be described below with reference to FIG.

Ｓ４３．会議音声調整部１７は、映像会議中か否かを判断する。 S43. The conference audio adjustment unit 17 determines whether the video conference is in progress.

Ｓ４４．対象物特定部１３は、音声が入力されたか否かを判定し、入力されたと判定された場合には次のステップへ移行する。 S44. The object specifying unit 13 determines whether or not sound is input. If it is determined that the sound is input, the process proceeds to the next step.

Ｓ４５．対象物特定部１３は、発話者の声紋を解析する。 S45. The object specifying unit 13 analyzes the voiceprint of the speaker.

Ｓ４９．前述のＳ４４にて音声の入力が検出できないとき、会議音声調整部１７は、発話者がいないと判断して音量レベルを初期値に戻す。 S49. When voice input cannot be detected in S44 described above, the conference voice adjustment unit 17 determines that there is no speaker and returns the volume level to the initial value.

Ｓ４６．対象物特定部１３は、Ｓ４５の声紋認識の結果に基づいて、発話者を特定する。 S46. The object specifying unit 13 specifies the speaker based on the result of the voiceprint recognition in S45.

Ｓ４７．対象物特定部１３は、複数発話者が同時に発話をしているか否かを判断する。 S47. The object specifying unit 13 determines whether a plurality of speakers are speaking at the same time.

Ｓ４８．前述のＳ４７にて複数発話者が同時に発話しているのを検出すると、音量増減レベル取得部１５は、Ｓ４６で認識された声紋データが記録されているレコードを、会議音量調整値テーブルに基づく音量制御情報記憶部１４から抽出する。そして、当該レコードの音量増加レベルに設定されている値を抽出する。登録済みの複数発話者全員分の音量増加レベルが取得できたら、各音量増加レベルの内、最大増加レベル分、音量レベルを増加させる。 S48. When it is detected in S47 that a plurality of speakers are speaking at the same time, the volume increase / decrease level acquisition unit 15 uses the volume based on the conference volume adjustment value table as a record in which the voiceprint data recognized in S46 is recorded. Extracted from the control information storage unit 14. Then, the value set for the volume increase level of the record is extracted. When the volume increase level for all registered multiple speakers is obtained, the volume level is increased by the maximum increase level among the volume increase levels.

Ｓ５０．前述のＳ４７にて発話しているのが一人であるのを検出すると、音量増減レベル取得部１５は、Ｓ４６で認識された声紋データが記録されているレコードを、会議音量調整値テーブルに基づく音量制御情報記憶部１４から抽出する。そして、当該レコードの音量増加レベルに設定されている値を抽出する。その後、抽出された音量増加レベル分、音量レベルを増加させる。 S50. When it is detected that one person is speaking in S47 described above, the volume increase / decrease level acquisition unit 15 records a record in which the voiceprint data recognized in S46 is recorded based on the conference volume adjustment value table. Extracted from the control information storage unit 14. Then, the value set for the volume increase level of the record is extracted. Thereafter, the volume level is increased by the extracted volume increase level.

上述の実施の形態では、声紋によって発話者を特定する例を説明したが、それに限定するものではなく、発話者の声の特徴を利用するものであって発話者を特定することができるものであればよい。更に、発話者を特定できればよいため、必ずしも声の特徴を用いる必要はない。例えば、発話者は必ず発話者ごとに割り振られた携帯電話の所定のキーを押下してから発言をするようにし、この押下されたキーに基づいて発話者を特定しても構わない。 In the above-described embodiment, an example in which a speaker is specified by a voiceprint has been described. However, the present invention is not limited to this, and can use the characteristics of a speaker's voice to specify the speaker. I just need it. Furthermore, since it is only necessary to identify the speaker, it is not always necessary to use voice characteristics. For example, a speaker may always make a statement after pressing a predetermined key of a mobile phone assigned to each speaker, and the speaker may be specified based on the pressed key.

また、上述の実施の形態では、人物の音声の音量を調整する例を説明したが、本発明の適用分野としては、これに限られるものではない。例えば、楽器や機械等の音を発する物体を対象とし、これらの発する音を通信端末が取得して、これを他の通信端末に送信する例も考えられる。この場合であっても、これらの物体(または物体の一部)の動きや音の特徴を検知することで、いずれの物体に対応する音であるかを決定し、決定された物体に対して設定されている音量増加レベルに基づいて音量を制御することが可能である。 In the above-described embodiment, an example of adjusting the volume of a person's voice has been described. However, the application field of the present invention is not limited to this. For example, an object that emits sound, such as a musical instrument or machine, may be considered, and a communication terminal may acquire the sound that is emitted and transmit it to another communication terminal. Even in this case, by detecting the motion and sound characteristics of these objects (or part of the objects), it is possible to determine which object corresponds to the sound and It is possible to control the volume based on the set volume increase level.

１Ａさんの顔相
２Ｂさんの顔相
３アンテナ
４無線通信装置
４１ＲＦ
４２ＢＢ
５表示部
６マイク
７スピーカ
８音声入出力部
９入力部
１０カメラ部
１１プロセッサ
１２記憶部
１２１ＲＡＭ
１２２ＲＯＭ
１３対象物特定部
１４音量制御情報記憶部
１５音量増減レベル取得部
１６スピーカ制御部
１７会議音声調整部
１８マイク制御部
１９通話記録部
２０入力制御部
２１表示制御部
２２カメラ制御部 1 Facial appearance of Mr. A 2 Facial appearance of Mr. B 3 Antenna 4 Radio communication device 41 RF
42 BB
5 Display Unit 6 Microphone 7 Speaker 8 Audio Input / Output Unit 9 Input Unit 10 Camera Unit 11 Processor 12 Storage Unit 121 RAM
122 ROM
DESCRIPTION OF SYMBOLS 13 Object identification part 14 Volume control information storage part 15 Volume increase / decrease level acquisition part 16 Speaker control part 17 Conference audio | voice adjustment part 18 Microphone control part 19 Call recording part 20 Input control part 21 Display control part 22 Camera control part

Claims

A volume control device for controlling the volume of sound corresponding to acquired data,
Identifying means for identifying an object corresponding to the sound;
Obtaining means for obtaining volume control information corresponding to the object specified by the specifying means, from storage means storing volume control information corresponding to the object;
Control means for controlling the volume of the sound according to the volume control information acquired by the acquisition means;
A volume control apparatus comprising:

A volume control method for controlling the volume of sound corresponding to acquired data,
Identify the object corresponding to the sound,
Obtaining volume control information corresponding to the identified object from the storage means storing the volume control information corresponding to the object;
Controlling the volume of the sound according to the acquired volume control information;
A volume control method characterized by the above.

Computer
Identifying means for identifying the object corresponding to the sound;
Obtaining means for obtaining volume control information corresponding to the object specified by the specifying means, from storage means storing volume control information corresponding to the object;
The program for functioning as a control means which controls the volume of the sound according to the volume control information acquired in the acquisition means.