KR100238470B1

KR100238470B1 - Multiple Speaker Selection Apparatus and Method for Video Conference System

Info

Publication number: KR100238470B1
Application number: KR1019950045695A
Authority: KR
Inventors: 박정훈
Original assignee: 윤종용; 삼성전자주식회사
Priority date: 1995-11-30
Filing date: 1995-11-30
Publication date: 2000-01-15
Anticipated expiration: 2015-11-30
Also published as: KR970032095A

Abstract

1. 청구범위에 기재된 발명이 속한 기술분야1. TECHNICAL FIELD OF THE INVENTION

화상회의시스템에 있어서 오디오 믹싱(audio mixing)을 위해 다수의 화자를 선택하는 회로 및 방법에 관한 것이다.A circuit and method for selecting multiple speakers for audio mixing in a videoconferencing system.

2. 발명이 해결하려고 하는 기술적 과제2. The technical problem to be solved by the invention

다수의 화자를 실시간에 정확하게 선택하는 장치 및 방법을 제공함에 있다.An apparatus and method for accurately selecting a plurality of speakers in real time is provided.

3. 발명의 해결방법의 요지3. Summary of Solution to Invention

일정기간동안 각 음성신호의 부호가 동일상태를 유지하는 횟수 그리고 크기가 기준값을 초과하는 횟수를 카운트하여, 화자선택시점에서 이 부호/크기 카운트값을 비교하여 화자를 선택함을 특징으로 한다.It is characterized by counting the number of times that the sign of each voice signal remains the same and the number of times the magnitude exceeds the reference value for a certain period of time, and selecting the speaker by comparing the sign / size count value at the point of speaker selection.

4. 발명의 중요한 용도4. Important uses of the invention

음성신호의 특성을 이용하여 실질적인 화자의 음성을 정확히 선택하는데 사용한다.It is used to accurately select the actual speaker's voice by using the characteristics of the voice signal.

Description

Multiple Speaker Selection Apparatus and Method for Video Conference System

제1도는 본 발명에 따른 화상회의 시스템의 구성 예시도.1 is an exemplary configuration diagram of a video conference system according to the present invention.

제2도는 제1도중 화상회의 제어장치의 구성도.2 is a configuration diagram of a video conference control device in FIG.

제3도는 제1도중 오디오믹싱부의 구체적인 구성도.3 is a detailed configuration diagram of an audio mixing unit in FIG. 1.

제4도는 본 발명에 따른 다수의 화자 선택을 위한 비교과정을 나타낸 흐름도.4 is a flowchart illustrating a comparison process for selecting a plurality of speakers according to the present invention.

제5도는 제4도의 비교결과에 의거 화자선택시점에서 화자를 선택하는 과정을 나타낸 흐름도.5 is a flowchart illustrating a process of selecting a speaker at the time of speaker selection based on the comparison result of FIG.

제6a도는 음성신호의 특성을 나타낸 그래프.6A is a graph showing characteristics of a voice signal.

제6b도는 크기의 비교에 따라 한 화자만을 선택하는 경우를 설명하기 위한 음성신호 파형도.FIG. 6B is a waveform diagram of an audio signal for explaining the case where only one speaker is selected according to the comparison of magnitudes.

제6c도는 부호 및 크기의 비교에 따라 화자를 선택하는 경우를 설명하기 위한 제1단말의 음성신호 파형도.6C is a sound signal waveform diagram of a first terminal for explaining a case where a speaker is selected according to a comparison of a sign and a magnitude.

제6d도는 부호 및 크기의 비교에 따라 화자를 선택하는 경우를 설명하기 위한 제2단말의 음성신호 파형도.6d is a sound signal waveform diagram of a second terminal for explaining the case where a speaker is selected according to a comparison of a sign and a magnitude.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

2a : ISDN BRI 인터페이스부 2c : H.221프레이밍부2a: ISDN BRI interface 2c: H.221 framing

2b : ISDN PRI 인터페이스부 2d : 데이타분리추출부2b: ISDN PRI interface part 2d: data separation extracting part

2e : 오디오믹싱부 2f : 비디오스위칭부2e: Audio Mixer 2f: Video Switching Unit

2g : 데이타처리부 2h : 제어부2g: data processor 2h: controller

2i : 멀티플랙서 2j : H. 221리프레이밍부2i: multiplexer 2j: H. 221 reframing unit

10 : 디코더 12 : 부호체크부10: decoder 12: code check unit

14 : 비교부 16 : 카운트부14: comparison unit 16: count unit

18 : 화자선택부 20 : 엔코더18: Speaker selector 20: Encoder

본 발명은 화상회의시스템에 있어서 오디오 믹싱(audio mixing)을 위해 다수의 화자를 선택하는 회로 및 방법에 관한 것으로, 특히 음성신호의 특성을 이용하여 화자의 음성을 선택하는 회로 및 방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to circuits and methods for selecting a plurality of speakers for audio mixing in a videoconferencing system, and more particularly to circuits and methods for selecting a speaker's voice using characteristics of the voice signal.

일반적인 화상회의의 형태로는 단대단(point to point) 화상회의와 하나의 그룹을 형성하여 다자간(multi-point)에 이루어지는 화상회의도 있다.Common forms of video conferencing include point-to-point video conferencing and multi-point video conferencing in a group.

국제 규격인 ITU-T의 H시리즈 중 H. 221프레임 구조를 갖는 화상회의를 예를 들어 설명하면, 제1도는 본 발명에 따른 화상회의시스템의 구성예를 나타낸 것이다. 화상회의제어장치 100은 ISDN을 통해 다수의 단말(T1~Tn)이 동시에 화상회의에 참여할 수 있도록 제어하는 역할을 한다.Referring to the video conferencing having the H.221 frame structure in the H series of the international standard ITU-T, for example, FIG. 1 shows an example of the configuration of the video conferencing system according to the present invention. The video conferencing control device 100 controls the plurality of terminals T1 to Tn to participate in the video conference at the same time through the ISDN.

제2도는 제1도중 화상회의제어장치 100의 상세 구성도로서, 각 부분을 연결하는 데이타라인은 MVIP(Multi-Vendor Integrated Protocol)이고 상기 각 부분의 구체적인 동작을 설명하면 다음과 같다.FIG. 2 is a detailed configuration diagram of the video conferencing control device 100 of FIG. 1, wherein a data line connecting each part is MVIP (Multi-Vendor Integrated Protocol) and the detailed operation of each part is as follows.

라인인터페이스부 2a, 2b는 ISDN망에 화상회의제어장치를 접속시키기 위해 ITU-T Q.931에 의거한 신호처리절차를 수행하는 것으로서, BRI 2a 혹은 PRI 2b 둘중 어느 하나로만 구성될 수도 있다. H. 221 프레이밍부 2c는 H. 221에서 규정하고 있는 프레임 구조를 검색 및 확립하는 모듈로서 FAS(Frame Alignment Signal)와 BAS(Bit rate Allocation Signal)를 동기화시킨다. 디멀티플랙싱부 2d는 동기화된 음향, 화상, 데이타로 구성된 화상데이타를 각각의 독립된 데이타로 추출해내어 버스상에 전달한다. 오디오믹싱부 2e는 에너지측정(energy measurement)부와 믹싱부(mixing part)로 이루어져, 상기 디멀티플랙싱부 2d로부터 입력된 음향 데이타들로부터 음향 에너지가 큰 단말들을 추출해내어 이들을 혼합한다. 비디오 스위칭부 2f는 상기 디멀티플랙싱부 2d로부터 입력된 각 단말의 화상으로부터 원하는 단말의 화상을 추출 및 공급해준다. 데이타처리부 2g는 상기 디멀티플랙싱부 2d로부터 입력된 데이타를 처리한다. 제어부 2h는 상기 H. 221 프레이밍부 2c로부터 입력된 제어명령(control code)에 따라 입출력을 제어하며 CCITT 권고안 H. 231/H. 242/H. 243에 의해 화상회의제어장치의 전반적인 동작을 총괄적으로 제어하는 모듈이다. 데이타조합부 2i는 상기 모듈들(2e, 2f, 2g)로부터 각각 처리된 데이타를 하나의 데이타 구조로 다시 조합하는 멀티플랙싱모듈이다. H. 221 리프레이밍(reframing)부 2j는 상기 데이타조합부 2i로부터 입력된 데이타를 H. 221 프레임 구조를 갖는 데이타로 재변환시켜 주는 것으로, 이렇게 변환된 데이타는 다시 상기 라인인터페이스부 2a 또는 2b를 통해 ISDN망으로 전송된다.The line interface units 2a and 2b perform a signal processing procedure according to ITU-T Q.931 to connect the videoconferencing control device to the ISDN network, and may be configured only with either BRI 2a or PRI 2b. The H. 221 framing unit 2c is a module that searches for and establishes the frame structure specified in H. 221 and synchronizes a frame alignment signal (FAS) and a bit rate allocation signal (BAS). The demultiplexer 2d extracts the image data composed of the synchronized sound, image, and data into respective independent data and delivers them on the bus. The audio mixing unit 2e includes an energy measurement unit and a mixing unit, and extracts terminals having a large acoustic energy from the acoustic data input from the demultiplexer 2d and mixes them. The video switching unit 2f extracts and supplies an image of a desired terminal from the image of each terminal input from the demultiplexing unit 2d. The data processor 2g processes the data input from the demultiplexer 2d. The control unit 2h controls the input / output according to the control code inputted from the H. 221 framing unit 2c, and the CCITT Recommendation H. 231 / H. 242 / H. 243 is a module that collectively controls the overall operation of the videoconferencing controller. The data combiner 2i is a multiplexing module for recombining the data processed from the modules 2e, 2f, and 2g into one data structure. The H. 221 reframing unit 2j converts the data input from the data combination unit 2i into data having an H. 221 frame structure. The converted data is then converted into the line interface unit 2a or 2b. Is transmitted through the ISDN network.

이러한 구성을 갖는 화상회의제어장치 100은 화상회의의 제어를 오디오모드로 할 것인지 아니면 의장모드로 할 것인지에 따라 그 제어방법을 달리한다. 즉 전자의 경우 오디오신호의 크기에 따라 화자를 찾아 음성을 믹싱하고 후자의 경우에는 의장단말에 의해 화자가 지정되므로 오디오처리를 위한 별도의 화자 탐색작업이 필요하지 않다.The videoconferencing control device 100 having such a configuration differs in its control method depending on whether the videoconferencing control is set to the audio mode or the design mode. In the former case, the speaker is searched for according to the size of the audio signal and the voice is mixed. In the latter case, since the speaker is designated by the chairman terminal, a separate speaker search operation for audio processing is not required.

한편, 회의 참가자들에게 현재 화자(speaker)의 음성신호를 전달해주기 위해 한명의 화자를 선택하는 종래의 방법은 다음 두가지를 그 예로 들수 있는데,On the other hand, there are two conventional methods of selecting one speaker to deliver the current speaker's voice signal to the conference participants.

첫째, 회의에 참가하는 참가자들의 음성신호의 크기를 비교하여 가장 큰 음성신호를 화자로 우선 결정하고, 이 값이 일정치 보다 큰 경우 진정한 화자로 결정하는 것이다. 이러한 경우를 설명하기 위한 음성신호의 파형을 나타낸 것이 제6b도이다.First, the loudest audio signal is first determined as the speaker by comparing the magnitudes of the voice signals of the participants participating in the conference, and if the value is larger than the predetermined value, the true speaker is determined. FIG. 6B shows the waveform of the audio signal for explaining such a case.

이렇게 단순히 음성신호의 크기만을 비교하는 방법을 다수의 화자 선택에 적용할 경우, 매 입력때마다 음성의 크기순으로 화자를 선택할 수 밖에 없다. 그러므로 잡음이 발생하여 음성으로 인식되더라도 이를 막을 방법이 없다. 결국 회의에서 의장 역할을 맡은 주화자 그리고 나머지 부화자를 선택할 때 주화자가 매 입력 때마다 바뀌는 오류가 발생할 우려가 있다.When the method of simply comparing the sizes of voice signals is applied to the selection of a plurality of speakers, the speaker must be selected in the order of the size of the voice at each input. Therefore, even if noise occurs and is recognized as voice, there is no way to prevent it. As a result, when selecting the main speaker and the remaining incubators at the meeting, there is a risk that the coin will change every time it is entered.

두번째로, 일정 샘플링기간동안 음성신호의 크기를 바로 비교하지 않고 그 신호의 자기상관계수(auto-correlation coefficient)를 구하여 그 계수의 크기를 비교함으로써 화자를 결정하였다.Secondly, the speaker was determined by obtaining auto-correlation coefficients of the signals without comparing the magnitudes of the voice signals for a certain sampling period and comparing the magnitudes of the coefficients.

이렇게 일정기간 동안의 음성신호에 대한 자기상관계수를 구하고 그 크기를 비교하는 방법의 경우에는 자기상관계수를 계산하는 시점에서 계산량이 많이 증가하게 되어 회의시스템에서 다수화자 선택이 실시간으로 이루어지기 힘들다는 문제점이 있고, 이를 해소하기 위해 주기를 짧게 하면 전술한 첫번째 방법과 같은 오류가 발생한다.In the method of obtaining autocorrelation coefficients and comparing the magnitudes of the voice signals for a certain period of time, the calculation amount increases at the time of calculating the autocorrelation coefficients. There is a problem, and if the period is shortened to solve it, the same error as the first method described above occurs.

결론적으로 상기와 같이 음성신호의 크기를 비교하거나 일정기간동안 자기상관계수를 비교하는 방법은 세명 이상의 화자 선택이 필요한 화상회의 시스템에 적용하기에는 부적합하다.In conclusion, the method of comparing the magnitude of the voice signal or comparing the autocorrelation number for a certain period of time is not suitable for the video conferencing system requiring the selection of three or more speakers.

따라서 본 발명의 목적은 다수의 화자를 실시간에 정확하게 선택하는 장치 및 방법을 제공함에 있다.Accordingly, an object of the present invention is to provide an apparatus and method for accurately selecting a plurality of speakers in real time.

상기한 목적을 달성하기 위한 본 발명은 일정기간동안 각 음성신호의 부호가 동일상태를 유지하는 횟수 그리고 크기가 기준값을 초과하는 횟수를 카운트하여, 화자선택시점에서 이 부호/크기 카운트값을 비교하여 화자를 선택함을 특징으로 한다.In order to achieve the above object, the present invention counts the number of times that the sign of each voice signal remains the same and the number of times the magnitude exceeds the reference value for a predetermined period, and compares the sign / size count value at the time of speaker selection. Characterized by selecting a speaker.

이하 본 발명의 바람직한 실시예를 첨부한 도면을 참조하여 상세히 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

우선 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 한해서는 비록 다른 도면상에 표시되더라도 가능한한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한 하기 설명에서는 구체적인 회로의 구성 소자등과 같은 많은 특정(特定) 사항들이 나타나고 있는데, 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐 이러한 특정사항들 없이도 본 발명이 실시될 수 있음은 이 기술분야에서 통상의 지식을 가진자에게는 자명하다할 것이다. 그리고 본 발명을 설명함에 있어, 관련된 공지 기능 혹은 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.First, in adding reference numerals to the components of each drawing, it should be noted that the same reference numerals have the same reference numerals as much as possible even if displayed on different drawings. In addition, in the following description, many specific details such as components of specific circuits are shown, which are provided to help a more general understanding of the present invention, and the present invention may be practiced without these specific details. It will be obvious to those skilled in the art. In the following description of the present invention, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted.

본 실시예에 따른 화상회의시스템의 음성신호 처리부분은 제3도에 도시된 바와같이 A/μ-1aw PCM 데이타 디코더와 화자선택부분 및 A/μ-1aw PCM 데이타 엔코더로 구성되어 있다. 그리고 입력신호는 ITU-T의 G. 711 권고안(Recommendation)의 PCM신호이다.The audio signal processing portion of the videoconferencing system according to the present embodiment includes an A / μ-1aw PCM data decoder, a speaker selection portion, and an A / μ-1aw PCM data encoder as shown in FIG. And the input signal is the PCM signal of the G.711 Recommendation of ITU-T.

G. 711의 경우 1프레임이 80샘플로 이루어짐을 감안하여 입력신호의 주기는 125μs이고, 화자선택은 10초에 한번씩 하는 것을 기준으로 본 실시예의 동작을 설명하면 다음과 같다.In the case of G.711, the period of the input signal is 125 s in consideration of 80 samples per frame, and the operation of the present embodiment will be described based on the speaker selection every 10 seconds.

음성신호는 제6a도에 도시된 바와 같이 +값과 -값 영역을 진동하는 형태의 그래프적 특성을 갖고 있다. 본 실시예에서는 음성신호의 이러한 특성을 이용한다.As shown in FIG. 6A, the audio signal has a graphical characteristic in the form of oscillating the + value and the − value region. In this embodiment, this characteristic of the voice signal is used.

제4도를 참조하면, 4a단계에서 변수를 초기화한다음 4b단계에서 회의에 참가하는 단말로부터 참가자의 음성신호를 입력한다. 4c단계에서 상기 입력한 음성신호의 부호가 양의 값인지 아니면 음의 값인지 확인하고 그 확인결과를 4d, 4e단계에서 이진값 0 혹은 1로써 저장한다. 이후 4f단계에서 상기 이진값을 소정의 기준값과 비교하여 상기 기준값보다 크면 4g단계에서 크기카운트를 증가시킨다. 그리고 4h단계로 진행하여 이때(현재)의 부호가 이전에 저장된 부호와 같은지 비교해서 같은 값이 아니면 4i단계에서 부호카운트를 증가시킨다. 이와같은 동작을 각 참가자의 수만큼 반복한다.Referring to FIG. 4, in step 4a, the variable is initialized, and in step 4b, the voice signal of the participant is input from the terminal participating in the conference. In step 4c, whether the sign of the input voice signal is a positive value or a negative value is checked, and the result of the check is stored as a binary value 0 or 1 in steps 4d and 4e. Thereafter, when the binary value is larger than the reference value in step 4f, the size count is increased in step 4g. In step 4h, the current count is compared with the previously stored code, and if it is not the same value, the code count is increased in step 4i. Repeat this action for each participant.

상기 반복되는 과정은 화자 선택시점이 될 때까지 계속되는데, 화자선택시점이 되면, 제5도에 도시된 바와같이, 5a단계에서 변수를 초기화하고 5b단계로 진행하여 각 단말의 부호/크기카운트를 입력하고 5c단계에서 우선 부호카운트의 크기를 비교해서 큰 순서대로 3명의 화자를 선택한다. 만일 5c단계에 도시된 바와같이 부호카운트의 크기가 같은 단말이 2이상인 경우에는 5e단계로 진행하여 크기카운트의 크기가 최대인 단말을 화자로 선택한다. 이때 만약 크기카운트의 크기가 기준값에 미달하면 이전에 선택되었던 화자의 음성신호를 계속 화자로서 유지한다. 상기 5e단계에서 부호카운트의 크기가 같은 단말이 2이상이 아닌 경우에는 5f단계로 진행하여 부호카운트가 최대치인 단말을 화자로 결정한다.The repeated process is continued until the speaker selection point is reached. When the speaker selection point is reached, as shown in FIG. 5, the variable is initialized in step 5a and proceeds to step 5b to determine the sign / size count of each terminal. In step 5c, first, three speakers are selected in large order by comparing the size of sign counts. If there are two or more terminals having the same size of the code count as shown in step 5c, the process proceeds to step 5e and selects the terminal having the largest size count as the speaker. At this time, if the size count is less than the reference value, the voice signal of the previously selected speaker is maintained as the speaker. In step 5e, if the terminal having the same size of the code count is not 2 or more, the process proceeds to step 5f to determine the terminal having the maximum code count as the speaker.

상기한 방법을 음성신호의 파형을 참조하여 부연설명하면,When the above method is described further with reference to the waveform of the audio signal,

제6b도는 크기의 비교에 따라 한 화자만을 선택하는 경우를 설명하기 위한 음성신호 파형도이다. 이는 종래의 방법을 나타낸 것이다. 이 방법에 따르면, 순간적인 큰 크기의 음성 혹은 주변의 잡음이 입력될 경우 화자로 인식해버리게 된다. 즉 실제로 말을 하는 경우에는 파형이 연속적인 형태를 나타내게 되지만, 말을 하지 않고 있는 상황에서 잡음이나 기타 환경적인 이유등으로 인해 순간적으로 큰 소리가 났을 때는 파형이 급격하게 달라지므로 이것이 검출됨으로써 화자로 오인되어 버리는 것이다.6B is an audio signal waveform diagram for explaining the case where only one speaker is selected according to the size comparison. This represents a conventional method. According to this method, instantaneous loud voice or ambient noise is recognized as a speaker. In other words, if you actually speak, the waveform will show a continuous shape, but if you do not speak and the loud noise is made momentarily due to noise or other environmental reasons, the waveform will change dramatically. It is mistaken.

제6c도는 부호 및 크기의 비교에 따라 화자를 선택하는 경우를 설명하기 위한 제1단말의 음성신호 파형도이고, 제6d도는 부호 및 크기의 비교에 따라 화자를 선택하는 경우를 설명하기 위한 제2단말의 음성신호 파형도이다. 상기 두 파형도에 따르면, T1시점이 화자선택시점이라고 가정할때, 상기 T1시점에서 상기 제6c도와 제6d도의 파형에 따른 부호카운트는 공히 4가 된다. 그러나 크기카운트는 제6c도의 경우 3이고 제6d도의 경우 4이다. 그러므로 제6d도의 파형을 나타내고 있는 단말이 화자로 선택되어지는 것이다.FIG. 6C is a waveform diagram of a voice signal of a first terminal for explaining a case of selecting a speaker according to a comparison of a sign and a size, and FIG. 6D is a second diagram for explaining a case of selecting a speaker according to a comparison of a sign and a size. Sound signal waveform diagram of the terminal. According to the two waveform diagrams, assuming that T1 is a speaker selection point, the code counts corresponding to the waveforms of the 6c and 6d diagrams at the time T1 are all four. However, the size count is 3 for FIG. 6C and 4 for FIG. 6D. Therefore, the terminal showing the waveform of FIG. 6d is selected as the speaker.

한편, 본 실시예를 회의 시스템에 있어 한명의 화자 선택뿐만 아니라 3명정도의 다수의 화자 선택도 소프트웨어적으로 실시간 처리가 가능하다.On the other hand, in the present embodiment, in the conference system, not only one speaker selection but also a plurality of speaker selection of about three people can be processed in software in real time.

결론적으로, 본 실시예에 따르면 다음 두가지 화자선택 상황이 발생할 수 있을 것이다.In conclusion, according to the present embodiment, two speaker selection situations may occur.

첫번째, 회의 시스템에서 한 화자의 음성을 상대 회의참가자들에게 전달하는 경우, 회의중 실제로 말을 하고 있는 사람을 선택해야 된다. 이러한 경우는 부호카운트를 체크함으로써 현재 회의참가자가 말을 계속하고 있는지를 알 수 있다. 그리고 말을 하고 있는 사람이 여러명인 경우는 크기카운트를 체크함으로써 여러명중 레벨이 큰 사람을 선택할 수 있다.First, when a conference system delivers a speaker's voice to the other conference participants, it is necessary to select the person who is actually speaking during the conference. In this case, you can see if the current participant is speaking by checking the sign count. If there are several people speaking, the size count can be checked to select the person with the larger level.

두번째, 많은 참가자들중 현재 말을 하고 있는 사람의 음성신호를 믹싱해서 듣는 사람들에게 전송해야 하는 경우에는 부호카운트를 체크해서 크기가 큰 순으로 필요한 수 만큼 선택할 수 있다.Second, if many participants need to mix the voice signal of the current speaker and send it to the listeners, they can check the number of signs and select as many as necessary.

상술한 바와 같은 본 발명은 음성신호의 특성을 이용함으로써 자기 상관계수를 이용하는 방법에서 필요로 하였던 메모리와 많은 계산량을 줄일 수가 있어 경계적이고 처리시간적인 측면에서 잇점을 가지며, 화상회의시스템을 제어하기 위한 오디오 믹싱과 보이스 액티베이티드 모드(voice activated mode)에 필요한 3명 혹은 한명의 화자 선택에 수정없이 모두 적용할 수 있다. 또한 ITU-T의 G. 711(PCM), G. 722(ADPCM) 및 G. 728(LD-CELP)등이 적용되는 회의시스템에서 기준값의 변화만으로 간단하게 화자선택 모듈을 구현할 수 있는 장점이 있다.As described above, the present invention has advantages in terms of boundary and processing time because it can reduce a large amount of computation and memory required in the method using the autocorrelation coefficient by using the characteristics of the audio signal. It can be applied without modification to the selection of three or one speakers required for audio mixing and voice activated mode. In addition, in the conference system to which G.711 (PCM), G.722 (ADPCM), and G.728 (LD-CELP) of ITU-T are applied, the speaker selection module can be implemented simply by changing the reference value. .

한편 본 발명의 상세한 설명에서는 구체적인 실시예에 관해 설명하였으나, 본 발명의 범위에서 벗어나지 않는 한도내에서 여러가지 변형이 가능함은 물론이다. 그러므로 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 않되며 후술하는 특허청구이 범위뿐만 아니라 이 특허청구의 범위와 균등한 것들에 의해 정해져야 한다.Meanwhile, in the detailed description of the present invention, specific embodiments have been described, but various modifications may be made without departing from the scope of the present invention. Therefore, the scope of the present invention should not be limited to the described embodiments, but should be defined not only by the scope of the following claims, but also by the equivalents of the claims.

Claims

A plurality of terminals are controlled to simultaneously participate in the video conference, and each terminal participating in the video conference has a schedule for mixing audio of the terminals participating in the video conference in the conference control apparatus having a code and a size counter for the audio signal. A method of selecting a speaker by a period, the method comprising: checking the arrival of a speaker selection point, checking a sign count for each terminal participating in a video conference until the speaker selection point, and selecting a terminal having a maximum value; And selecting a terminal having a maximum value by checking a size count if the code counts for two or more terminals are the same as a result of the check of the code count.

The method of claim 1, wherein the performing of the code and the size count for each terminal comprises: inputting a voice signal from a terminal participating in a video conference, checking a code of the input voice signal, Storing a previous code, converting the input voice signal into a pulse code modulated signal, and comparing the magnitude of the pulse code modulated signal with a predetermined reference value to increase the size count by a unit value if the reference value is larger than the reference value. And checking whether the current code is the same as a previously stored code and increasing the code count by a unit value if the same is the same.

A video conferencing system having a single terminal and a conference control apparatus for performing video processing and audio mixing such that the plurality of terminals participate in a video conference simultaneously, wherein the conference control apparatus is any one that participates in a video conference. A voice input unit for inputting a voice signal from the terminal, a signal converter for converting the input voice signal into a pulse code modulated signal, a code count, a size counter, and terminals to participate in a video conference are registered and a predetermined binary code is input. Check the memory for storing and the code of the voice signal input from the registered terminal to give a binary code according to the positive or negative, and check whether the binary code currently stored in the memory is the same as the previously stored binary code If it is the same, the sign counter is increased by a unit value, and the magnitude of the pulse code modulation signal is compared with a predetermined reference value. The counter control unit increases the size counter by a unit value if the reference value is greater than the reference value, and checks the sign count for each terminal participating in the video conference until a certain period, and selects the terminal as the speaker if there is one terminal having the maximum value. And a speaker selector configured to select a terminal having a maximum value as a speaker by checking the size counts of the corresponding terminals when the code counts for the two or more terminals are the same.