[go: up one dir, main page]

WO2018163418A1 - Communication system, api server used in communication system, headset, and portable communication terminal - Google Patents

Communication system, api server used in communication system, headset, and portable communication terminal Download PDF

Info

Publication number
WO2018163418A1
WO2018163418A1 PCT/JP2017/009756 JP2017009756W WO2018163418A1 WO 2018163418 A1 WO2018163418 A1 WO 2018163418A1 JP 2017009756 W JP2017009756 W JP 2017009756W WO 2018163418 A1 WO2018163418 A1 WO 2018163418A1
Authority
WO
WIPO (PCT)
Prior art keywords
unit
utterance
voice
server
headset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2017/009756
Other languages
French (fr)
Japanese (ja)
Inventor
雄太 楢崎
貴大 宮坂
俊介 粟飯原
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bonx Inc
Original Assignee
Bonx Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bonx Inc filed Critical Bonx Inc
Priority to PCT/JP2017/009756 priority Critical patent/WO2018163418A1/en
Priority to CN202110473317.7A priority patent/CN113114866A/en
Priority to CN201880015280.XA priority patent/CN110663244B/en
Priority to EP23183175.1A priority patent/EP4239992A3/en
Priority to JP2018526268A priority patent/JP6416446B1/en
Priority to US16/490,766 priority patent/US20200028955A1/en
Priority to PCT/JP2018/008697 priority patent/WO2018164165A1/en
Priority to EP18764411.7A priority patent/EP3595278B1/en
Publication of WO2018163418A1 publication Critical patent/WO2018163418A1/en
Priority to JP2018187677A priority patent/JP6742640B2/en
Priority to JP2018187678A priority patent/JP6815654B2/en
Anticipated expiration legal-status Critical
Priority to JP2020207754A priority patent/JP7219492B2/en
Ceased legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/60Substation equipment, e.g. for use by subscribers including speech amplifiers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • the present invention relates to a communication system, an API server, a headset, and a mobile communication terminal used in the communication system.
  • the mobile communication terminals that the callers have before each other are registered as a group, and the speaker's voice is encoded.
  • the many-to-many call within the group is performed by exchanging the voice data between the mobile communication terminals registered in the group.
  • voice data encoded with the voice of a speaker is transmitted to a mobile communication terminal of a caller participating in the group via the VoIP server.
  • voice data By transmitting voice data via the VoIP server in this way, the communication load due to many-to-many calls within the group can be reduced to some extent.
  • a short distance such as Bluetooth (registered trademark) between the mobile communication terminal and the mobile communication terminal.
  • a headset that is connected and used in a communication system is also used. By using the headset, the mobile communication terminal can pick up the voice of the caller even when the caller is not holding the mobile communication terminal by hand, and the caller can hold the mobile communication terminal by hand. Even when the mobile communication terminal does not hold it, the mobile communication terminal can transmit the conversation sent from the mobile communication terminal of the other party to the caller.
  • the audio is encoded as audio data and then transmitted, the size of the audio data to be transmitted per unit time becomes larger than the communicable bandwidth, resulting in transmission delay of the audio data.
  • the audio data to be transmitted per unit time can be appropriately sized with respect to the communicable bandwidth to improve the audio data transmission delay to some extent. In this case, however, the sound quality of the sound obtained when the transmitted sound data is decoded deteriorates, and there is a problem that a good conversation cannot be performed.
  • environmental noise such as wind noise, crowd noise, construction sound, mining sound, engine sound, etc. may be a problem when talking on snowy mountains, the sea, in crowds, construction sites, quarries, airfields, etc. is there.
  • the microphone used for the call will pick up the ambient sound that is generated in addition to the speaker's utterance, and the voice that mixes the speaker's utterance and the environmental sound will be heard.
  • Data is encoded and transmitted to the mobile communication terminal of the participant of the call, but this environmental sound not only reduces the SN ratio, but also unnecessary audio data of only the environmental sound in which the speaker's utterance does not exist Will cause data delay and the like.
  • an object of the present invention is to provide a communication system capable of performing a comfortable group call even in an environment with a weak radio wave environment or a large environmental sound.
  • the communication system of the present invention comprises the following three means, and solves the above-mentioned problems that occur in many-to-many communication within a group by linking them together.
  • the present invention provides a communication system for performing a group call between a plurality of clients via a VoIP server, comprising an API server for managing the group call, wherein the client communicates via a mobile communication network And a headset for exchanging audio data with the mobile communication terminal by short-range wireless communication, the headset being included in the audio detected by the audio detector and the audio detected by the audio detector
  • a speech emphasizing unit that emphasizes the uttered portion relative to the environmental sound, and voice data received from the mobile communication terminal, and voice detected by the voice detecting unit in the utterance portion of the voice data is compared with ambient noise.
  • a playback control unit that plays back so that it is relatively easy to hear, and the mobile communication terminal receives from the headset.
  • a noise estimation unit that estimates noise included in the speech data; an utterance candidate determination unit that determines a range that is a candidate for an utterance portion from the speech data based on a result of estimation by the noise estimation unit; and the utterance candidate determination
  • An utterance determination unit that determines a portion that is human speech from a range that is a candidate for an utterance portion of the audio data determined by the unit, and the speech data that is determined to be human speech by the utterance determination unit.
  • a voice data transmission unit that transmits a portion to the VoIP server, and a reproduction voice data transmission unit that transmits the voice data received from the VoIP server to the headset.
  • the API server includes the client, Based on the communication status with the VoIP server, commands related to control of communication quality of the group call are sent to the client and the VoIP.
  • a communication quality control unit for notifying a server, wherein the voice data transmission unit is determined to be human voice by the speech quality determination unit with a communication quality based on a command notified from the communication quality control unit;
  • the audio data portion is encoded and transmitted to the VoIP server.
  • the present invention reduces the amount of data transferred over a mobile network in a many-to-many group call, thereby reducing the power consumption in portable communication terminals and headsets, and reducing the communication bandwidth. Even when it is not sufficient, it is possible to suppress the voice delay. Furthermore, by automatically detecting only the utterance period, noise is reduced without using hands and does not interfere with other activities, and only the content of the other party's utterance is clearly transmitted, so that the user experience UX (User Experience) ) Can be greatly improved.
  • UX User Experience
  • FIG. 1 is a schematic configuration diagram of a communication system according to an embodiment of the present invention.
  • FIG. 2 is a schematic functional block diagram of an API server according to an embodiment of the present invention.
  • 1 is a schematic functional block diagram of a mobile communication terminal according to an embodiment of the present invention.
  • 1 is a schematic functional block diagram of a headset according to an embodiment of the present invention. It is a sequence chart which shows the flow of the process performed on the headset which concerns on the speech detection function by one Embodiment of this invention, and a portable communication terminal. It is a figure which shows the image of the conversion until the audio
  • FIG. 1 is a diagram showing a schematic configuration of a communication system according to the present invention.
  • the communication system 300 according to the present invention includes at least one server 1 and the server 1 and GSM (registered trademark), 3G (registered trademark), 4G (registered trademark), WCDMA (registered trademark), and LTE (registered trademark). And a plurality of clients 2 that can be connected via a mobile network.
  • GSM registered trademark
  • 3G registered trademark
  • 4G registered trademark
  • WCDMA registered trademark
  • LTE registered trademark
  • the server 1 includes at least a VoIP (Voice Over Internet Protocol) server 11 for controlling voice communication between the clients 2, and at least one of the servers 1 included in the communication system 300 includes the client 2.
  • An API (Application Programmable Interface) server 10 that manages connection and allocation of the VoIP server 11 is provided.
  • the server 1 may be configured by a single server computer, or may be configured by preparing a plurality of server computers and implementing the respective functions on the respective server computers. Each server 1 may be distributed and arranged in each region in the world.
  • the server computer constituting the server 1 includes a storage device (main storage device, auxiliary storage device, etc.) such as a CPU, ROM, RAM, and hard disk, an I / O circuit, and the like.
  • the server 1 is connected to a wide area network according to a communication standard suitable for wired communication such as TCP / IP, and communicates with the other server 1 through the wide area network. Configured.
  • the API server 10 exchanges information necessary for the group call with a plurality of clients 2 participating in the group call when a group call is performed in a many-to-many manner, and based on the information obtained there.
  • the VoIP server 11 is instructed to serve as a management server for realizing a group call among a plurality of clients 2 participating in the group call.
  • the API server 10 is mounted on a server computer that constitutes the server 1.
  • the API server 10 can instruct not only the VoIP server 11 arranged in the same server 1 but also other VoIP servers 11 connectable via the network. This is because the API server 10 specifies the geographical position of the client 2 from information such as the IP addresses of the plurality of clients 2 participating in the group call, and the VoIP server 11 capable of low-latency connection from the client 2 is provided. It is possible to select and distribute the client 2 to the VoIP server 11. In addition, the API server 10 can detect a VoIP server 11 having a low operation rate from a plurality of VoIP servers 11 and distribute the clients 2 to the VoIP server 11.
  • the VoIP server 11 has a role of controlling voice packet exchange (conversation) between the clients 2 in response to an instruction from the API server 10.
  • the VoIP server 11 is mounted on a server computer constituting the server 1.
  • the VoIP server 11 may be configured as a software switch of a known IP-PBX (Internet Protocol-Private Branch Exchange).
  • IP-PBX Internet Protocol-Private Branch Exchange
  • the VoIP server 11 has a function of processing voice packets on-memory in order to realize a real-time call between the clients 2.
  • the client 2 includes a mobile communication terminal 20 provided by a user and a headset 21 connected to the mobile communication terminal 20 by near field communication such as Bluetooth communication.
  • the mobile communication terminal 20 has a role of performing voice packet communication control in a voice call by a user.
  • the mobile communication terminal 20 can be carried by a user such as a tablet terminal or a smartphone equipped with a storage device (main storage device, auxiliary storage device, etc.) such as a CPU, ROM, RAM, and memory card, and an I / O circuit. Consists of information terminals designed for possible size, shape and weight.
  • the mobile communication terminal 20 conforms to a communication standard suitable for wireless communication at a long distance, such as GSM (registered trademark), 3G (registered trademark), 4G (registered trademark), WCDMA (registered trademark), LTE (registered trademark).
  • GSM registered trademark
  • 3G registered trademark
  • 4G registered trademark
  • WCDMA registered trademark
  • LTE registered trademark
  • the server 1 and other clients 2 can communicate with each other via a wide area network connected to a base station (not shown).
  • the mobile communication terminal 20 can communicate audio data with the headset 21 in accordance with a short-range wireless communication standard such as Bluetooth (registered trademark) (hereinafter referred to as “first short-range wireless communication standard”). Composed.
  • a short-range wireless communication standard such as Bluetooth (registered trademark)
  • first short-range wireless communication standard registered trademark
  • second short-range wireless communication a short-range wireless communication standard that can communicate with less power than the first short-range wireless communication standard such as BLE (Bluetooth Low Energy) (registered trademark). It is configured to be able to communicate with the mobile communication terminal 20 at a short distance according to “communication standard”.
  • the headset 21 creates voice data based on the voice uttered by the user, transmits the created voice data to the mobile communication terminal 20, and plays voice based on the voice data transmitted from the mobile communication terminal 20. Have a role to play.
  • the headset 21 includes a storage device (main storage device and auxiliary storage device) such as a CPU, ROM, RAM, and memory card, and an I / O circuit such as a microphone and a speaker.
  • the headset 21 is configured to be able to communicate audio data with the headset 21 in accordance with a near field communication standard such as Bluetooth (registered trademark).
  • the headset 21 is preferably configured as an open-type headset so that the user who wears it can hear external environmental sounds.
  • the VoIP server 11 is installed in each region according to the usage status of the group call service, and the API server 10 centralizes calls by the arranged VoIP server 11. Since management becomes possible, the connection between the clients 2 between the multi-regions can be efficiently operated while recommending communication delay.
  • FIG. 2 is a diagram showing a schematic functional configuration of the API server 10.
  • the API server 10 includes a call establishment control unit 100, a call quality control unit 110, a client management unit 120, a server management unit 130, and a call group management unit 140. These functional means are realized by the CPU controlling a storage device, an I / O circuit, and the like included in the server computer on which the API server 10 is mounted.
  • the call establishment control unit 100 controls to start a group call between the client 2 and at least one other client 2 included in the group call start request. Is a functional means for performing When the call establishment control unit 100 receives a group call start request from the client 2, if the client 2 that has made the group call start request is not managed by the call group management unit 140, which will be described later, When the call group management unit 140 is instructed to create a call group and the client 2 that has requested group call start is managed by the call group management unit 140, the call group including the client 2 is The call group management unit 140 is instructed to add the client 2 included in the group call start request.
  • the call establishment control unit 100 When the call establishment control unit 100 instructs the call group management unit 140 to create a new call group, the call establishment control unit 100 communicates with a plurality of clients 2 participating in the new call group, and determines the geographical position of each client 2. Identify. The call establishment control unit 100 may specify the geographical position of the client 2 based on the IP address of the client 2, or from the position specifying means such as GPS provided in the mobile communication terminal 20 configuring the client 2. The geographical position of the client 2 may be specified based on the information. When the call establishment control unit 100 identifies the geographical positions of the plurality of clients 2 participating in the new call group, the positions of the plurality of identified clients 2 within the server 1 managed by the server management unit 130 described later.
  • a server 1 having a VoIP server 11 with a low availability is detected. Then, the call establishment control unit 100 instructs the plurality of clients 2 to start a group call via the VoIP server 11 included in the detected server 1.
  • the call quality control unit 110 is a functional unit that controls communication quality between a plurality of clients 2 participating in a group call.
  • the call quality control unit 110 monitors the data transfer delay state in the group call by the client 2 managed by the call group management unit 140. When a data transfer delay occurs in a certain client 2, that is, the client 2 The data quality is reduced by suppressing the data quality for the other clients 2 participating in the group call so that the client 2 can maintain the communication when the communication line condition deteriorates.
  • the call quality control unit 110 may monitor the data transfer delay state of the client 2 by acquiring the communication state of each client 2 from the VoIP server 11 that controls the group call at a predetermined cycle. When the data transfer delay state of the client 2 in which the data transfer delay has occurred is recovered, the call quality control unit 110 cancels the suppression of the data quality for the other clients 2 participating in the group call. Command.
  • the communication quality control unit 110 indicates that communication with a client 2 is interrupted when communication with a client 2 is interrupted, that is, when the client 2 becomes unable to communicate due to weak radio waves. To other clients 2 participating in the group call.
  • the call quality control unit 110 may detect that the communication of the client 2 has been interrupted by acquiring the communication status of each client 2 from the VoIP server 11 that controls the group call at a predetermined period. .
  • the call quality control unit 110 detects that the communication with the client 2 that has lost communication has been recovered, the call quality control unit 110 notifies the other clients 2 participating in the group call to that effect, and the communication is recovered. Then, the client 2 is controlled to rejoin the group call.
  • the client management unit 120 is a functional unit that manages client information that is information related to the client 2 that makes a group call.
  • the client information managed by the client management unit 120 includes at least identification information for uniquely identifying the client 2 corresponding to the client information, and further information such as the name of the user having the client 2 corresponding to the client information Alternatively, information relating to the geographical position of the client 2 corresponding to the client information may be included.
  • the client management unit 120 receives a client information registration request, a client information request, a client information deletion request, and the like from the client 2 in the same manner as a generally provided service, and performs client information registration, correction, deletion, etc. Processing may be performed.
  • the server management unit 130 is a functional unit that manages server information that is information related to the server 1 including the VoIP server 11 that can be commanded and controlled from the API server 10.
  • the server information managed by the server management unit 130 includes at least the geographical location of the server and the location (IP address, etc.) of the server on the network, and the operation rate of the VoIP server 11 provided in the server, Information related to the server administrator may be included.
  • the server management unit 130 may perform processing such as registration, correction, and deletion of server information in response to a server information registration operation, a server information correction operation, a server information deletion operation, and the like performed by the administrator of the API server 10. .
  • the call group management unit 140 is a functional unit that manages call group information, which is information related to a group of clients 2 that are currently making a group call (hereinafter referred to as “client group”).
  • the call group information managed by the call group management unit 140 includes at least information for identifying the client 2 participating in the group call corresponding to the call group information (identification information registered in the client information related to the client 2).
  • the call group information includes information related to the VoIP server used for the group call, and the call group information includes the communication status (data delay status, communication disruption status, etc.) of each client 2 participating in the group call. .
  • the call group management unit 140 receives a call group creation command, a call group deletion command, a call group correction command, etc. from the call establishment control unit 100 and the call quality control unit 110, and creates, corrects, deletes call group information, etc. Processing may be performed.
  • the API server 10 is configured to reduce the group call request from each client 2 based on the position of each client 2 participating in the group call and the operating rate of each VoIP server 11 with low delay.
  • the connection can be made to the VoIP server 11 that can be connected.
  • the API server 10 of the present embodiment detects the alive state of each client 2 that makes a group call via the VoIP server 11 installed in each region, and performs failover processing according to the situation. It is possible to provide an optimal group call service according to the situation without bothering the user.
  • FIG. 3 is a diagram showing a schematic functional configuration of the mobile communication terminal 20.
  • the mobile communication terminal 20 includes a group call management unit 201, a group call control unit 202, a noise estimation unit 203, an utterance candidate determination unit 204, an utterance determination unit 205, a reproduced audio data transmission unit 207, a communication unit 208, and short-range wireless communication. Part 209. These functional units are realized by the CPU controlling a storage device, an I / O circuit, and the like included in the mobile communication terminal 20.
  • the group call management unit 201 is a functional unit that exchanges information related to group call management with the API server 10 via the communication unit 208 and manages the start and end of the group call.
  • the group call management unit 201 transmits various requests such as a group call start request, a client addition request, and a group call end request to the API server 10 and performs group call control described later according to a response of the API server 10 to the request.
  • the group call is managed by instructing the unit 202.
  • the group call control unit 202 transmits / receives audio data to / from other clients 2 participating in the group call based on an instruction from the group call management unit 201 and transmits / receives audio data to / from the headset 21. Is a functional means for controlling.
  • the group call control unit 202 detects the utterance of the voice data related to the user's utterance received from the headset 21 by the noise estimation unit 203, the utterance candidate determination unit 204, and the utterance determination unit 205 described later, and controls the data quality of the voice data. I do.
  • the noise estimation unit 203 is a functional unit that estimates an average environmental sound from voice data related to a user's utterance received from the headset 21.
  • the voice data concerning the user's utterance received from the headset 21 includes the user's utterance and the environmental sound.
  • a noise estimation method by the noise estimation unit 203 a least square error (MMSE) estimation is used.
  • MMSE least square error
  • a known method such as maximum likelihood method or maximum posterior probability estimation may be used.
  • the noise estimation unit 203 sequentially updates the power spectrum of the environmental sound based on the MMSE standard based on the voice presence probability estimation for each sample frame, and is noise from the voice data using the power spectrum of the environmental sound.
  • the environmental sound may be estimated.
  • the utterance candidate determination unit 204 is a functional unit that determines a sound different from the average environmental sound from the sound data as an utterance candidate based on the estimation result of the environmental sound that becomes noise by the noise estimation unit 203.
  • the utterance candidate determination unit 204 compares the long-term spectrum fluctuation in units of several frames with the power spectrum of the environmental sound estimated by the noise estimation unit 203, so that the part of the non-stationary audio data is converted into the voice generated by the user's utterance. Judged as data.
  • the utterance determination unit 205 determines a portion of audio data that is estimated to be an unexpected environmental sound other than a human voice with respect to a portion that the utterance candidate determination unit 204 has determined to be audio data generated by a user's utterance. Is a functional means.
  • the utterance determination unit 205 is uttered from a human throat or the like by performing the estimation of the content ratio of the spectral period component for the portion determined by the utterance candidate determination unit 204 to be voice data by the utterance by the user. It is determined whether the voice data is based on voice.
  • the utterance determination unit 205 evaluates the distance from the utterer or whether it is a direct wave by estimating the degree of echo from the speech waveform, and whether or not the speech data is based on the speech uttered by the speaker. Determine.
  • the audio data transmission unit 206 encodes the audio data in a range in which the utterance determination unit 205 determines that it is a sudden environmental sound from the range determined as the utterance candidate by the utterance candidate determination unit 204 and excludes the portion.
  • the group call control unit 202 encodes the audio data with the encoding method and communication quality determined based on the command from the communication quality control unit 110 of the API server 10. .
  • the reproduction voice data transmission unit 207 transmits the decoded voice data received from the VoIP server via the communication unit 208 to the headset 21 via the short-range wireless communication unit 209.
  • the communication unit 208 is a functional unit that controls communication via the mobile network.
  • the communication unit 208 is realized using a communication interface for a general mobile communication network or the like.
  • the near field communication unit 209 is a functional unit that controls near field communication such as Bluetooth (registered trademark).
  • the short-range wireless communication unit 209 is realized using a general short-range wireless communication interface.
  • FIG. 4 is a diagram showing a schematic functional configuration of the headset 21.
  • the headset 21 includes a voice detection unit 211, a speech enhancement unit 212, a playback control unit 213, and a short-range wireless communication unit 216. These functional units are realized by the CPU controlling a storage device, an I / O circuit, and the like included in the headset 21.
  • the voice detection unit 211 is a functional unit that detects the speech of the user wearing the headset 21 and converts it into voice data.
  • the sound detection unit 211 includes a microphone, an A / D conversion circuit, an encoder for sound data, and the like included in the headset 21. It is desirable that at least two microphones are provided as microphones constituting the sound detection unit 211.
  • the utterance emphasis unit 212 is a functional means that emphasizes and detects the utterance of the user wearing the headset 21 from the voice data detected and converted by the voice detection unit 211.
  • the speech enhancement unit 212 emphasizes the user's speech relative to the environmental sound using, for example, a known beam forming algorithm.
  • the processing performed by the speech emphasizing unit 212 suppresses the environmental sound included in the audio data relative to the user's speech, thereby improving sound quality and reducing the performance and calculation load of the subsequent signal processing. It becomes possible.
  • the voice data converted by the speech emphasizing unit 212 is transmitted to the mobile communication terminal 20 via the short-range wireless communication unit 216.
  • the playback control unit 213 is a functional unit that plays back audio data received from the mobile communication terminal 20 via the short-range wireless communication unit 216.
  • the reproduction control unit 213 includes an audio data decoder, a D / A conversion circuit, a speaker, and the like included in the headset 21.
  • the reproduction control unit 213 listens to the user for audio data to be reproduced based on the environmental sound detected by the microphone included in the headset 21 when reproducing the audio in the speech period in the audio data received from the mobile communication terminal 20. Play in a form that is easy to do.
  • the reproduction control unit 213 may perform noise canceling processing based on the ambient noise estimated by the voice detection unit to cancel the environmental sound heard by the user and make it easier to hear the reproduced sound. A process of increasing the playback volume in conjunction with the size of the sound may be performed to make it easier to hear the playback sound.
  • the client 2 of the present embodiment having the above-described configuration reduces the size of audio data transmitted to the communication path by performing multifaceted audio data processing that links various estimation processes related to speech and environmental sound. However, clear utterance playback is possible. Thereby, labor saving of power consumption in each device constituting the client 2 and a significant improvement in UX (User Experience) can be realized.
  • FIG. 5 is a sequence chart showing a flow of processing executed on the headset and the mobile communication terminal related to the speech detection function.
  • the voice detection unit 211 detects a user's utterance including an environmental sound as a voice and converts it into voice data.
  • the utterance emphasizing unit 212 emphasizes the utterance voice of the user included in the voice data converted in step SA01 relative to the environmental sound.
  • the short-range wireless communication unit 216 transmits the audio data converted in Step SA02 to the first mobile communication terminal 20.
  • the noise estimation unit 203 analyzes the audio data received from the first headset, and estimates the environmental sound that is noise included in the audio data.
  • the utterance candidate determination unit 204 determines a sound different from the average environmental sound from the speech data as the utterance candidate based on the estimation result of the environmental sound that becomes noise by the noise estimation unit 203 in Step SA04.
  • the utterance determination unit 205 determines the distance of the voice data determined by the utterance candidate determination unit 204 that is the utterance candidate by the user in step SA05 from the sudden environmental sound or the microphone of the headset. The part of the speech data that is estimated to be uttered from a certain position is determined.
  • Step SA07 In the group call control unit 202, the utterance determination unit 205 is emitted from a position away from the sudden environmental sound or the headset microphone in step SA06 from the range determined as the utterance candidate in step SA05.
  • the speech data is encoded with the encoding method and the communication quality determined by the exchange with the VoIP server 11 for the speech data in the range excluding the portion determined to be an utterance, and the encoded speech data is converted to the VoIP server Send to.
  • FIG. 6 is a diagram showing an image of conversion until sound data transmitted from sound detected according to the sequence chart of FIG. 5 is generated.
  • the communication system of the present invention only the portion necessary for reproducing the utterance is extracted from the detected voice, so that the voice data encoded and transmitted to the VoIP server 11 is usually Compared with audio data transmitted in the communication system, the size can be reduced.
  • FIG. 7 is a sequence chart showing a flow of processing executed on the headset and the mobile communication terminal related to the audio reproduction control function.
  • the group call control unit 202 decodes the data received by the encoding method determined by the exchange with the VoIP server 11 into voice data.
  • the reproduction audio data transmission unit 207 transmits the audio data decoded in step SB02 to the second headset 21.
  • Step SB03 The sound detection unit 211 detects environmental sound as sound and converts it into sound data.
  • the playback control unit 213 performs processing for making it easier to hear the playback sound of the sound data received from the second mobile communication terminal with respect to the environmental sound detected in Step SB03 in the speech section of the sound data. Reproduce.
  • FIG. 8 is a sequence chart showing a flow of processing executed on the API server, the VoIP server, and the mobile communication terminal related to the communication control function when a data transfer delay occurs.
  • the VoIP server 11 detects the data transfer delay of the second mobile communication terminal 20.
  • the VoIP server 11 notifies the API server 10 of the data transfer delay status of the second mobile communication terminal 20.
  • Step SC03 The communication quality control unit 110 determines the communication quality according to the data transfer delay state of the second mobile communication terminal 20 notified from the VoIP server 11, and sets the determined communication quality to the VoIP server 11 And the first mobile communication terminal 20 belonging to the same client group as the second mobile communication terminal 20.
  • the VoIP server 11 changes the communication quality of the client group to which the second mobile communication terminal 20 belongs to the communication quality commanded in Step SC03.
  • Step SC05 The first mobile communication terminal 20 changes the communication quality to the communication quality commanded in Step SC03.
  • FIG. 9 is a sequence chart showing a flow of processing executed on the API server, the VoIP server, and the mobile communication terminal related to the communication control function when the data transfer state is recovered.
  • the VoIP server 11 detects the recovery of the data transfer status of the second mobile communication terminal 20.
  • the VoIP server 11 notifies the API server 10 of the recovery of the data transfer status of the second mobile communication terminal 20.
  • the communication quality control unit 110 receives the VoIP server 11 and the second mobile phone so as to recover the communication quality in response to the recovery of the data transfer status of the second mobile communication terminal 20 notified from the VoIP server 11. Commands the first mobile communication terminal 20 belonging to the same client group as the communication terminal 20. [Step SD04] The VoIP server 11 recovers the communication quality of the client group to which the second mobile communication terminal 20 belongs. [Step SD05] The first mobile communication terminal 20 recovers the communication quality.
  • FIG. 10 is a sequence chart showing a flow of processing executed on the API server, the VoIP server, and the mobile communication terminal related to the communication control function when communication interruption occurs.
  • the VoIP server 11 detects that communication with the second mobile communication terminal 20 has been interrupted.
  • the VoIP server 11 notifies the API server 10 of the communication interruption of the second mobile communication terminal 20.
  • the communication quality control unit 110 notifies the first mobile communication terminal 20 belonging to the same client group as the second mobile communication terminal 20 that communication with the second mobile communication terminal 20 has been interrupted.
  • the VoIP server 11 changes the information related to the communication state of the second mobile communication terminal 20 to the communication interruption state.
  • the first mobile communication terminal 20 changes the information related to the communication state of the second mobile communication terminal 20 to a communication disruption state.
  • FIG. 11 is a sequence chart showing a flow of processing executed on the API server, the VoIP server, and the mobile communication terminal related to the communication control function when communication interruption occurs.
  • the VoIP server 11 detects that the communication status of the second mobile communication terminal 20 has recovered.
  • the VoIP server 11 notifies the API server 10 of the recovery of the communication status of the second mobile communication terminal 20.
  • the communication quality control unit 110 notifies the first mobile communication terminal 20 belonging to the same client group as the second mobile communication terminal 20 that the communication with the second mobile communication terminal 20 has been recovered.
  • the VoIP server 11 changes the information related to the communication state of the second mobile communication terminal 20 to the normal state.
  • the first mobile communication terminal 20 changes the information related to the communication state of the second mobile communication terminal 20 to the normal state.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)

Abstract

Provided is a communication system with which it is possible to carry out a comfortable group call even in a low-intensity radio wave environment and an environment having large environmental sound. This communication system 300 is provided with an API server 10 for managing a group call, a portable communication terminal 20 for performing communication via a portable communication network, and a headset 21 for exchanging voice data with the portable communication terminal 20 via short-distance wireless communication. The headset 21 is provided with an utterance enhancement unit for enhancing an utterance part included in voice relative to environmental sound, and the portable communication terminal 20 extracts the utterance part from the voice data received from the headset 21 and transmits the utterance part to a partner in the group call. Communication between the portable communication terminals 20 is controlled by means of a command related to the control of communication quality from the API server 10.

Description

通信システム、通信システムに用いられるAPIサーバ、ヘッドセット、及び携帯通信端末Communication system, API server used in communication system, headset, and portable communication terminal

 本発明は通信システム、通信システムに用いられるAPIサーバ、ヘッドセット、及び携帯通信端末に関する。 The present invention relates to a communication system, an API server, a headset, and a mobile communication terminal used in the communication system.

 携帯通信網を用いたグループ内での多対多での通話を行うための通信システムでは、通話を開始する前に互いに通話者が有する携帯通信端末をグループとして登録し、話者の音声をエンコードした音声データを該グループに登録された携帯通信端末の間でやり取りすることによるグループ内での多対多通話が行われる。 In a communication system for making a many-to-many call within a group using a mobile communication network, the mobile communication terminals that the callers have before each other are registered as a group, and the speaker's voice is encoded. The many-to-many call within the group is performed by exchanging the voice data between the mobile communication terminals registered in the group.

 携帯通信網を用いてグループ間で通話を行う際には、話者の音声をエンコードした音声データがVoIPサーバを介してグループに参加している通話者が有する携帯通信端末へと送信される。このようにVoIPサーバを経由して音声データを送信することで、グループ内での多対多通話による通信負荷をある程度軽減することができる。 When performing a call between groups using a mobile communication network, voice data encoded with the voice of a speaker is transmitted to a mobile communication terminal of a caller participating in the group via the VoIP server. By transmitting voice data via the VoIP server in this way, the communication load due to many-to-many calls within the group can be reduced to some extent.

 また、携帯通信網を用いて通話を行う際には、携帯通信端末に備え付けられたマイクとスピーカーを用いて通話を行う以外に、携帯通信端末との間をBluetooth(登録商標)などの近距離通信方式で接続して使用されるヘッドセットを用いることも行われている。ヘッドセットを用いることにより、通話者が携帯通信端末を手で保持していない状態でも携帯通信端末が通話者の声を拾うことができるようになり、また、通話者が携帯通信端末を手で保持していない状態でも携帯通信端末は通話相手の携帯通信端末から送られてくる会話を通話者に伝えることができるようになる。 Further, when making a call using a mobile communication network, in addition to making a call using a microphone and a speaker provided in the mobile communication terminal, a short distance such as Bluetooth (registered trademark) between the mobile communication terminal and the mobile communication terminal. A headset that is connected and used in a communication system is also used. By using the headset, the mobile communication terminal can pick up the voice of the caller even when the caller is not holding the mobile communication terminal by hand, and the caller can hold the mobile communication terminal by hand. Even when the mobile communication terminal does not hold it, the mobile communication terminal can transmit the conversation sent from the mobile communication terminal of the other party to the caller.

 上記した従来技術を用いてグループ内多対多通話を行う場合、通話の参加者全員が良好な通信環境にいる場合には、高い音質で遅延の少ない音声をお互いにやり取りしながら会話することができる。 When making many-to-many calls within a group using the above-mentioned conventional technology, if all participants in the call are in a good communication environment, they can talk while exchanging voices with high sound quality and low delay. it can.

 しかしながら、雪山や海上、工事現場、採石場、飛行場などの基地局から離れた電波状態が悪い環境にいる場合、また、多くの携帯通信端末の利用者が密集している電波を拾いにくい環境にいる場合には、そのままの音声を音声データにエンコードして送信すると、単位時間当たりに送信するべき音声データのサイズが通信可能な帯域幅に対して大きくなるため、音声データの伝達遅延が発生し、快適な通話を継続することが困難となる問題がある。この時、音声のエンコードにおける圧縮率を大きくすることで単位時間当たりに送信するべき音声データを通信可能な帯域幅に対して適正なサイズとすることで音声データの伝達遅延をある程度改善することが可能となるが、その場合には伝達された音声データをデコードした際に得られる音声の音質が劣化し、やはり良好な会話をすることができなくなるという問題が生じる。 However, if you are in an environment with poor radio waves away from base stations such as snowy mountains, sea, construction sites, quarries, airfields, etc., it is also difficult to pick up radio waves that are crowded by many mobile communication terminal users. If the audio is encoded as audio data and then transmitted, the size of the audio data to be transmitted per unit time becomes larger than the communicable bandwidth, resulting in transmission delay of the audio data. There is a problem that it is difficult to continue a comfortable call. At this time, by increasing the compression rate in audio encoding, the audio data to be transmitted per unit time can be appropriately sized with respect to the communicable bandwidth to improve the audio data transmission delay to some extent. In this case, however, the sound quality of the sound obtained when the transmitted sound data is decoded deteriorates, and there is a problem that a good conversation cannot be performed.

 また、雪山や海上、人ごみの中、工事現場、採石場、飛行場などでの通話においては、風切音や雑踏の音、工事音、採掘音、エンジン音などの環境音が問題となることもある。このような環境において通話を行った場合、通話に用いるマイクが話者の発話以外に周りで発生している環境音を拾うことになり、話者の発話と環境音とが混ざった音声が音声データへとエンコードされて通話の参加者が有する携帯通信端末へと送信されるが、この環境音がSN比を小さくするばかりでなく、話者の発話が存在しない環境音のみの不要な音声データを送信することになり、データの遅延などの原因となってしまう。 In addition, environmental noise such as wind noise, crowd noise, construction sound, mining sound, engine sound, etc. may be a problem when talking on snowy mountains, the sea, in crowds, construction sites, quarries, airfields, etc. is there. When a call is made in such an environment, the microphone used for the call will pick up the ambient sound that is generated in addition to the speaker's utterance, and the voice that mixes the speaker's utterance and the environmental sound will be heard. Data is encoded and transmitted to the mobile communication terminal of the participant of the call, but this environmental sound not only reduces the SN ratio, but also unnecessary audio data of only the environmental sound in which the speaker's utterance does not exist Will cause data delay and the like.

 また、雪山や海上、人ごみの中、工事現場、採石場、飛行場などで通話を行う場合は、機器の運転・操作やスポーツ等、通話以外の活動を行っている最中であることが多い。このような状況では一般的にはトランシーバーを用いて、発話の区間のみボタンを押して明示的に送信を行う必要があるが、ボタン操作を伴うことで本来行うべき活動を妨げる事になる。 Also, when making a call at a snowy mountain, at sea, in a crowd, at a construction site, a quarry, an airfield, etc., it is often in the middle of an activity other than a call such as driving / operating equipment or playing sports. In such a situation, it is generally necessary to use a transceiver and push the button only during the utterance section to explicitly transmit, but the operation that should be originally performed is hindered by the button operation.

 更に、音声データを受信した側の携帯通信端末で受信した音声データをデコードした音声を再生した場合に、受信した側の環境音が原因で再生した音声が聞き取れなくなるという問題も生じる。環境音で音声が聞こえなくならないように、該環境音に対してノイズキャンセリングの技術を適用するということも考えられるが、雪山や海上、人ごみの中、工事現場、採石場、飛行場などで一律にノイズキャンセリングを行って環境音をカットした場合、通話者が周りで発生した危険を察知するのを遅らせる原因となるという問題が生じてしまう。 Furthermore, there is also a problem that when the audio data decoded by the mobile communication terminal on the receiving side of the audio data is reproduced, the reproduced audio cannot be heard due to the environmental sound on the receiving side. It is possible to apply noise canceling technology to the environmental sound so that the sound is not heard by the environmental sound, but it is uniform at snowy mountains, at sea, in crowds, construction sites, quarries, airfields, etc. When noise canceling is performed and the environmental sound is cut, there arises a problem that the caller delays the detection of the danger occurring around.

 上記課題に加えて、ヘッドセットと携帯通信端末とを用いたグループ通話を行っている際に、負荷が大きな音声エンコード方式を用いると、ヘッドセット及び携帯通信端末のバッテリーの消耗が早くなり、グループ通話を長時間継続することができなくなるという問題がある。特に、ヘッドセットは耳に装着する小さなものであることが多く、携帯通信端末と比較してもバッテリーの容量が小さくなるため、ヘッドセットと携帯通信端末で適切に役割を分担して、また、計算負荷の低いアルゴリズムを組み合わせて効率よく音声をエンコードする必要がある。 In addition to the above issues, when a group call using a headset and a mobile communication terminal is performed, if a heavy audio encoding method is used, the battery of the headset and the mobile communication terminal is consumed quickly, and the group There is a problem that a call cannot be continued for a long time. In particular, the headset is often a small one that is worn on the ear, and the battery capacity is small compared to the mobile communication terminal. Therefore, the headset and the mobile communication terminal appropriately share the role, It is necessary to encode speech efficiently by combining algorithms with low computational load.

 そこで本発明の目的は、弱電波環境や環境音が大きい環境においても快適なグループ通話を行うことが可能な通信システムを提供することである。 Therefore, an object of the present invention is to provide a communication system capable of performing a comfortable group call even in an environment with a weak radio wave environment or a large environmental sound.

 本発明の通信システムは以下の3つの手段を備え、これらを互いに連関させることによりグループ内での多対多通信で発生する上記問題を解決する。
手段1)ヘッドセットで検知した音声から高精度に人の発話部分を抽出して音声データを生成する手段
手段2)弱電波環境に対応した動的な通信品質制御手段
手段3)環境を考慮した騒音に強い再生制御手段
The communication system of the present invention comprises the following three means, and solves the above-mentioned problems that occur in many-to-many communication within a group by linking them together.
Means 1) Means for generating speech data by extracting a human speech portion with high accuracy from speech detected by a headset 2) Dynamic communication quality control means for a weak radio wave environment 3) Considering the environment Noise-resistant regeneration control means

 そして、本発明は、複数のクライアントの間でVoIPサーバを介したグループ通話を行う通信システムにおいて、前記グループ通話を管理するAPIサーバを備え、前記クライアントは携帯通信網を介して通信する携帯通信端末と、該携帯通信端末との間で近距離無線通信により音声データをやり取りするヘッドセットとを備え、前記ヘッドセットは、音声を検知する音声検知部と、前記音声検知部が検知した音声に含まれる発話部分を環境音に対して相対的に強調する発話強調部と、前記携帯通信端末から受信した音声データを、該音声データの発話部分において前記音声検知部が検知した音声を周辺騒音に対して相対的に聞き取りやすくなるように再生する再生制御部と、を備え、前記携帯通信端末は、前記ヘッドセットから受信した音声データに含まれるノイズを推定するノイズ推定部と、前記ノイズ推定部による推定の結果に基づいて、前記音声データから発話部分の候補となる範囲を判定する発話候補判定部と、前記発話候補判定部が判定した前記音声データの発話部分の候補となる範囲から人間の音声である部分を判定する発話性判定部と、前記発話性判定部により人間の音声であると判定された前記音声データの部分を前記VoIPサーバへと送信する音声データ送信部と、前記VoIPサーバから受信した音声データを前記ヘッドセットへと送信する再生音声データ送信部と、を備え、前記APIサーバは、前記クライアントと、前記VoIPサーバとの通信状況に基づいて、前記グループ通話の通信品質の制御に係る指令を前記クライアント及び前記VoIPサーバへと通知する通信品質制御部を備え、前記音声データ送信部は、前記通信品質制御部から通知された指令に基づいた通信品質で前記発話性判定部により人間の音声であると判定された前記音声データの部分をエンコードして前記VoIPサーバへと送信する、ことを特徴とする。 The present invention provides a communication system for performing a group call between a plurality of clients via a VoIP server, comprising an API server for managing the group call, wherein the client communicates via a mobile communication network And a headset for exchanging audio data with the mobile communication terminal by short-range wireless communication, the headset being included in the audio detected by the audio detector and the audio detected by the audio detector A speech emphasizing unit that emphasizes the uttered portion relative to the environmental sound, and voice data received from the mobile communication terminal, and voice detected by the voice detecting unit in the utterance portion of the voice data is compared with ambient noise. And a playback control unit that plays back so that it is relatively easy to hear, and the mobile communication terminal receives from the headset. A noise estimation unit that estimates noise included in the speech data; an utterance candidate determination unit that determines a range that is a candidate for an utterance portion from the speech data based on a result of estimation by the noise estimation unit; and the utterance candidate determination An utterance determination unit that determines a portion that is human speech from a range that is a candidate for an utterance portion of the audio data determined by the unit, and the speech data that is determined to be human speech by the utterance determination unit. A voice data transmission unit that transmits a portion to the VoIP server, and a reproduction voice data transmission unit that transmits the voice data received from the VoIP server to the headset. The API server includes the client, Based on the communication status with the VoIP server, commands related to control of communication quality of the group call are sent to the client and the VoIP. A communication quality control unit for notifying a server, wherein the voice data transmission unit is determined to be human voice by the speech quality determination unit with a communication quality based on a command notified from the communication quality control unit; The audio data portion is encoded and transmitted to the VoIP server.

 本発明により、多対多のグループ通話においてモバイルネットワークを介して転送されるデータ量が減少し、これにより携帯通信端末やヘッドセットにおける電力消費量を低減させることが可能となり、また、通信帯域が十分でない場合でも音声遅延を抑えることが可能となる。更に、自動的に発話区間のみを検出することで、手を用いず他の活動を妨げない形でノイズを低減し通話相手の発話内容だけがクリアに伝達されることによる通話のUX(User Experience)を大幅に向上させることができる。 The present invention reduces the amount of data transferred over a mobile network in a many-to-many group call, thereby reducing the power consumption in portable communication terminals and headsets, and reducing the communication bandwidth. Even when it is not sufficient, it is possible to suppress the voice delay. Furthermore, by automatically detecting only the utterance period, noise is reduced without using hands and does not interfere with other activities, and only the content of the other party's utterance is clearly transmitted, so that the user experience UX (User Experience) ) Can be greatly improved.

本発明の一実施形態による通信システムの概略的な構成図である。1 is a schematic configuration diagram of a communication system according to an embodiment of the present invention. 本発明の一実施形態によるAPIサーバの概略的な機能ブロック図である。FIG. 2 is a schematic functional block diagram of an API server according to an embodiment of the present invention. 本発明の一実施形態による携帯通信端末の概略的な機能ブロック図である。1 is a schematic functional block diagram of a mobile communication terminal according to an embodiment of the present invention. 本発明の一実施形態によるヘッドセットの概略的な機能ブロック図である。1 is a schematic functional block diagram of a headset according to an embodiment of the present invention. 本発明の一実施形態による発話検知機能に係るヘッドセット及び携帯通信端末上で実行される処理の流れを示すシーケンスチャートである。It is a sequence chart which shows the flow of the process performed on the headset which concerns on the speech detection function by one Embodiment of this invention, and a portable communication terminal. 図5のシーケンスチャートに従って検知された音声から送信される音声データが生成されるまでの変換のイメージを示す図である。It is a figure which shows the image of the conversion until the audio | voice data transmitted from the audio | voice detected according to the sequence chart of FIG. 5 are produced | generated. 本発明の一実施形態による音声再生制御機能に係るヘッドセット及び携帯通信端末上で実行される処理の流れを示すシーケンスチャートである。It is a sequence chart which shows the flow of the process performed on the headset and portable communication terminal which concern on the audio | voice reproduction | regeneration control function by one Embodiment of this invention. 本発明の一実施形態によるデータ転送遅延が発生した時の通信制御機能に係るAPIサーバ、VoIPサーバ及び携帯通信端末上で実行される処理の流れを示すシーケンスチャートである。It is a sequence chart which shows the flow of the process performed on the API server, VoIP server, and portable communication terminal which concern on the communication control function when the data transfer delay generate | occur | produces by one Embodiment of this invention. 本発明の一実施形態によるデータ転送状況が回復した時の通信制御機能に係るAPIサーバ、VoIPサーバ及び携帯通信端末上で実行される処理の流れを示すシーケンスチャートである。It is a sequence chart which shows the flow of the process performed on the API server, VoIP server, and portable communication terminal which concern on the communication control function when the data transfer condition by one Embodiment of this invention recovers. 本発明の一実施形態による通信途絶が発生した時の通信制御機能に係るAPIサーバ、VoIPサーバ及び携帯通信端末上で実行される処理の流れを示すシーケンスチャートである。It is a sequence chart which shows the flow of the process performed on the API server, VoIP server, and portable communication terminal which concern on the communication control function when the communication interruption generate | occur | produces by one Embodiment of this invention. 本発明の一実施形態による通信途絶が発生した時の通信制御機能に係るAPIサーバ、VoIPサーバ及び携帯通信端末上で実行される処理の流れを示すシーケンスチャートである。It is a sequence chart which shows the flow of the process performed on the API server, VoIP server, and portable communication terminal which concern on the communication control function when the communication interruption generate | occur | produces by one Embodiment of this invention.

 以下、本発明の実施形態を図面と共に説明する。
<1.通信システムの全体構成>
 図1は、本発明の通信システムの概略的な構成を示す図である。本発明の通信システム300は、少なくとも1つ以上のサーバ1と、該サーバ1とGSM(登録商標)、3G(登録商標)、4G(登録商標)、WCDMA(登録商標)、LTE(登録商標)などのモバイルネットワークを介して接続可能な複数のクライアント2を備える。
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
<1. Overall configuration of communication system>
FIG. 1 is a diagram showing a schematic configuration of a communication system according to the present invention. The communication system 300 according to the present invention includes at least one server 1 and the server 1 and GSM (registered trademark), 3G (registered trademark), 4G (registered trademark), WCDMA (registered trademark), and LTE (registered trademark). And a plurality of clients 2 that can be connected via a mobile network.

 サーバ1は少なくともクライアント2の間での音声通信を制御するためのVoIP(Voice Over Internet Protcol)サーバ11を備え、また、通信システム300に含まれるサーバ1の内の少なくとも1つは、クライアント2の接続やVoIPサーバ11の割り振りを管理するAPI(Application Programmable Interface)サーバ10を備えている。サーバ1は、1つのサーバコンピュータにより構成しても良く、また、複数のサーバコンピュータを用意してそれぞれのサーバコンピュータ上にそれぞれの機能を実装して構成しても良い。また、それぞれのサーバ1は、世界中の各地域に分散して配置されていてもよい。 The server 1 includes at least a VoIP (Voice Over Internet Protocol) server 11 for controlling voice communication between the clients 2, and at least one of the servers 1 included in the communication system 300 includes the client 2. An API (Application Programmable Interface) server 10 that manages connection and allocation of the VoIP server 11 is provided. The server 1 may be configured by a single server computer, or may be configured by preparing a plurality of server computers and implementing the respective functions on the respective server computers. Each server 1 may be distributed and arranged in each region in the world.

 サーバ1を構成するサーバコンピュータは、CPU、ROM、RAM及びハードディスク等の記憶装置(主記憶装置及び補助記憶装置等)およびI/O回路等により構成される。また、サーバ1は、TCP/IPなどの有線通信に適した通信規格にしたがって広域ネットワークに接続され、該広域ネットワークを介して他のサーバ1との間で相互通信できるようにと相互通信するように構成される。 The server computer constituting the server 1 includes a storage device (main storage device, auxiliary storage device, etc.) such as a CPU, ROM, RAM, and hard disk, an I / O circuit, and the like. In addition, the server 1 is connected to a wide area network according to a communication standard suitable for wired communication such as TCP / IP, and communicates with the other server 1 through the wide area network. Configured.

 APIサーバ10は、多対多で行われるグループ通話を行う際に該グループ通話に参加する複数のクライアント2との間で該グループ通話に必要となる情報をやり取りし、そこで得られた情報に基づいてVoIPサーバ11に対して指令して該グループ通話に参加する複数のクライアント2間でグループ通話を実現する管理サーバとしての役割を持つ。APIサーバ10は、サーバ1を構成するサーバコンピュータ上に実装される。 The API server 10 exchanges information necessary for the group call with a plurality of clients 2 participating in the group call when a group call is performed in a many-to-many manner, and based on the information obtained there. The VoIP server 11 is instructed to serve as a management server for realizing a group call among a plurality of clients 2 participating in the group call. The API server 10 is mounted on a server computer that constitutes the server 1.

 APIサーバ10は、同一のサーバ1内に配置されるVoIPサーバ11だけでなく、ネットワークを介して接続可能な他のVoIPサーバ11に対しても指令することが可能である。これは、APIサーバ10が、グループ通話に参加する複数のクライアント2のIPアドレスなどの情報から該クライアント2の地理的な位置を特定し、該クライアント2から低遅延接続が可能なVoIPサーバ11を選択し、該VoIPサーバ11に対して前記クライアント2を振り分けることを可能とする。また、APIサーバ10は、複数のVoIPサーバ11の中から稼働率の低いVoIPサーバ11を検出して該VoIPサーバ11に対してクライアント2を振り分けることが可能である。 The API server 10 can instruct not only the VoIP server 11 arranged in the same server 1 but also other VoIP servers 11 connectable via the network. This is because the API server 10 specifies the geographical position of the client 2 from information such as the IP addresses of the plurality of clients 2 participating in the group call, and the VoIP server 11 capable of low-latency connection from the client 2 is provided. It is possible to select and distribute the client 2 to the VoIP server 11. In addition, the API server 10 can detect a VoIP server 11 having a low operation rate from a plurality of VoIP servers 11 and distribute the clients 2 to the VoIP server 11.

 VoIPサーバ11は、APIサーバ10からの指令を受けて、各クライアント2間での音声パケットのやり取り(会話)を制御する役割を持つ。VoIPサーバ11は、サーバ1を構成するサーバコンピュータ上に実装される。VoIPサーバ11は、公知のIP-PBX(Internet Protocol-Private Branch Exchange)のソフトウェアスイッチとして構成しても良い。VoIPサーバ11は、クライアント2間でのリアルタイム通話を実現するためにオンメモリで音声パケットを処理する機能を備える。 The VoIP server 11 has a role of controlling voice packet exchange (conversation) between the clients 2 in response to an instruction from the API server 10. The VoIP server 11 is mounted on a server computer constituting the server 1. The VoIP server 11 may be configured as a software switch of a known IP-PBX (Internet Protocol-Private Branch Exchange). The VoIP server 11 has a function of processing voice packets on-memory in order to realize a real-time call between the clients 2.

 クライアント2は、利用者が備える携帯通信端末20と、該携帯通信端末20との間でBluetooth通信などの近距離無線通信で接続されたヘッドセット21を備える。
 携帯通信端末20は、利用者による音声通話における音声パケットの通信制御を行う役割を持つ。携帯通信端末20は、CPU、ROM、RAM及びメモリカード等の記憶装置(主記憶装置及び補助記憶装置等)およびI/O回路等を備えた、タブレット型端末又はスマートフォンなどの利用者による携帯が可能なサイズ、形状および重量に設計されている情報端末により構成される。
The client 2 includes a mobile communication terminal 20 provided by a user and a headset 21 connected to the mobile communication terminal 20 by near field communication such as Bluetooth communication.
The mobile communication terminal 20 has a role of performing voice packet communication control in a voice call by a user. The mobile communication terminal 20 can be carried by a user such as a tablet terminal or a smartphone equipped with a storage device (main storage device, auxiliary storage device, etc.) such as a CPU, ROM, RAM, and memory card, and an I / O circuit. Consists of information terminals designed for possible size, shape and weight.

 携帯通信端末20は、GSM(登録商標)、3G(登録商標)、4G(登録商標)、WCDMA(登録商標)、LTE(登録商標)等の遠距離での無線通信に適した通信規格にしたがって、図示しない基地局に接続された広域ネットワークを介してサーバ1や他のクライアント2と相互通信できるように構成されている。 The mobile communication terminal 20 conforms to a communication standard suitable for wireless communication at a long distance, such as GSM (registered trademark), 3G (registered trademark), 4G (registered trademark), WCDMA (registered trademark), LTE (registered trademark). The server 1 and other clients 2 can communicate with each other via a wide area network connected to a base station (not shown).

 携帯通信端末20は、Bluetooth(登録商標)等の近距離無線通信規格(以下、「第1近距離無線通信規格」という。)に従って、ヘッドセット21との間で音声データを相互通信できるように構成される。また、携帯通信端末20は、BLE(Bluetooth Low Energy)(登録商標)等の第1近距離無線通信規格よりも小さな電力で通信が可能な近距離無線通信規格(以下、「第2近距離無線通信規格」という。)に従って、近距離にある携帯通信端末20と通信できるように構成される。 The mobile communication terminal 20 can communicate audio data with the headset 21 in accordance with a short-range wireless communication standard such as Bluetooth (registered trademark) (hereinafter referred to as “first short-range wireless communication standard”). Composed. In addition, the mobile communication terminal 20 is a short-range wireless communication standard (hereinafter referred to as “second short-range wireless communication”) that can communicate with less power than the first short-range wireless communication standard such as BLE (Bluetooth Low Energy) (registered trademark). It is configured to be able to communicate with the mobile communication terminal 20 at a short distance according to “communication standard”.

 ヘッドセット21は、利用者の発話した音声に基づいて音声データを作成し、作成した音声データを携帯通信端末20へと送信すると共に、携帯通信端末20から送信されてきた音声データに基づいて音声を再生する役割を持つ。ヘッドセット21は、CPU、ROM、RAM及びメモリカード等の記憶装置(主記憶装置及び補助記憶装置等)およびマイクロフォンやスピーカーなどのI/O回路等を備える。ヘッドセット21は、Bluetooth(登録商標)等の近距離無線通信規格に従って、ヘッドセット21との間で音声データを相互通信できるように構成される。ヘッドセット21は、装着した利用者が外部の環境音を聞き取ることができるように、オープン型のヘッドセットとして構成されていることが望ましい。 The headset 21 creates voice data based on the voice uttered by the user, transmits the created voice data to the mobile communication terminal 20, and plays voice based on the voice data transmitted from the mobile communication terminal 20. Have a role to play. The headset 21 includes a storage device (main storage device and auxiliary storage device) such as a CPU, ROM, RAM, and memory card, and an I / O circuit such as a microphone and a speaker. The headset 21 is configured to be able to communicate audio data with the headset 21 in accordance with a near field communication standard such as Bluetooth (registered trademark). The headset 21 is preferably configured as an open-type headset so that the user who wears it can hear external environmental sounds.

 上記した構成を備えた本実施形態の通信システム300は、グループ通話サービスの利用状況に応じて各地域にVoIPサーバ11を設置し、配置されたVoIPサーバ11による通話をAPIサーバ10で統括的に管理することが可能となるため、マルチリージョン間でのクライアント2間の接続を通信遅延を提言させながら効率的に運用することが可能となる。 In the communication system 300 according to the present embodiment having the above-described configuration, the VoIP server 11 is installed in each region according to the usage status of the group call service, and the API server 10 centralizes calls by the arranged VoIP server 11. Since management becomes possible, the connection between the clients 2 between the multi-regions can be efficiently operated while recommending communication delay.

<2.サーバの機能構成>
 図2はAPIサーバ10の概略的な機能構成を示す図である。APIサーバ10は、通話確立制御部100、通話品質制御部110、クライアント管理部120、サーバ管理部130、通話グループ管理部140を備える。これらの機能手段は、APIサーバ10が実装されているサーバコンピュータが備える記憶装置やI/O回路等をCPUが制御することにより実現される。
<2. Server functional configuration>
FIG. 2 is a diagram showing a schematic functional configuration of the API server 10. The API server 10 includes a call establishment control unit 100, a call quality control unit 110, a client management unit 120, a server management unit 130, and a call group management unit 140. These functional means are realized by the CPU controlling a storage device, an I / O circuit, and the like included in the server computer on which the API server 10 is mounted.

 通話確立制御部100は、クライアント2からのグループ通話開始要求に基づいて、該クライアント2と、該グループ通話開始要求に含まれる少なくとも1つの他のクライアント2との間でのグループ通話を開始する制御を行う機能手段である。通話確立制御部100は、クライアント2からのグループ通話開始要求を受けると、グループ通話開始要求をしたクライアント2が後述する通話グループ管理部140で管理されていない場合には、該クライアント2を含む新しい通話グループを作成するように通話グループ管理部140に対して指令し、グループ通話開始要求をしたクライアント2が通話グループ管理部140で管理されている場合には、該クライアント2を含む通話グループに対してグループ通話開始要求に含まれるクライアント2を追加するように通話グループ管理部140に対して指令する。 Based on the group call start request from the client 2, the call establishment control unit 100 controls to start a group call between the client 2 and at least one other client 2 included in the group call start request. Is a functional means for performing When the call establishment control unit 100 receives a group call start request from the client 2, if the client 2 that has made the group call start request is not managed by the call group management unit 140, which will be described later, When the call group management unit 140 is instructed to create a call group and the client 2 that has requested group call start is managed by the call group management unit 140, the call group including the client 2 is The call group management unit 140 is instructed to add the client 2 included in the group call start request.

 通話確立制御部100は、通話グループ管理部140に対して新しい通話グループの作成を指令する際に、新しい通話グループに参加する複数のクライアント2と通信を行い、それぞれのクライアント2の地理的位置を特定する。通話確立制御部100は、クライアント2のIPアドレスに基づいてクライアント2の地理的位置を特定するようにしても良いし、クライアント2を構成する携帯通信端末20が備えるGPSなどの位置特定手段からの情報に基づいてクライアント2の地理的位置を特定するようにしても良い。
 通話確立制御部100は、新しい通話グループに参加する複数のクライアント2の地理的位置を特定すると、後述するサーバ管理部130で管理されているサーバ1の内で、特定した複数のクライアント2の位置から見て低遅延接続可能な地域に配置されている少なくとも1以上のサーバ1を抽出した上で、その中から稼働率の低いVoIPサーバ11を備えたサーバ1を検出する。そして、通話確立制御部100は、検出したサーバ1が備えるVoIPサーバ11を介したグループ通話を開始するように、複数のクライアント2に対して指令する。
When the call establishment control unit 100 instructs the call group management unit 140 to create a new call group, the call establishment control unit 100 communicates with a plurality of clients 2 participating in the new call group, and determines the geographical position of each client 2. Identify. The call establishment control unit 100 may specify the geographical position of the client 2 based on the IP address of the client 2, or from the position specifying means such as GPS provided in the mobile communication terminal 20 configuring the client 2. The geographical position of the client 2 may be specified based on the information.
When the call establishment control unit 100 identifies the geographical positions of the plurality of clients 2 participating in the new call group, the positions of the plurality of identified clients 2 within the server 1 managed by the server management unit 130 described later. After extracting at least one or more servers 1 arranged in a region where low-latency connection is possible, a server 1 having a VoIP server 11 with a low availability is detected. Then, the call establishment control unit 100 instructs the plurality of clients 2 to start a group call via the VoIP server 11 included in the detected server 1.

 通話品質制御部110は、グループ通話に参加している複数のクライアント2の間での通信品質の制御を行う機能手段である。通話品質制御部110は、通話グループ管理部140で管理されているクライアント2によるグループ通話におけるデータ転送遅延状況を監視し、あるクライアント2にデータ転送遅延が発生した時、即ち該クライアント2が弱電波になるなどして通信回線の状況が悪化した時に、該クライアント2が通信を維持できるようにグループ通話に参加している他のクライアント2に対してデータ品質を抑制してデータ量を低減するように指令する。通話品質制御部110は、グループ通話の制御を行っているVoIPサーバ11から各クライアント2の通信状況を所定の周期で取得することによりクライアント2のデータ転送遅延状況を監視するようにしても良い。通話品質制御部110は、データ転送遅延が発生したクライアント2のデータ転送遅延状況が回復した場合には、グループ通話に参加している他のクライアント2に対してデータ品質の抑制を解除するように指令する。 The call quality control unit 110 is a functional unit that controls communication quality between a plurality of clients 2 participating in a group call. The call quality control unit 110 monitors the data transfer delay state in the group call by the client 2 managed by the call group management unit 140. When a data transfer delay occurs in a certain client 2, that is, the client 2 The data quality is reduced by suppressing the data quality for the other clients 2 participating in the group call so that the client 2 can maintain the communication when the communication line condition deteriorates. To The call quality control unit 110 may monitor the data transfer delay state of the client 2 by acquiring the communication state of each client 2 from the VoIP server 11 that controls the group call at a predetermined cycle. When the data transfer delay state of the client 2 in which the data transfer delay has occurred is recovered, the call quality control unit 110 cancels the suppression of the data quality for the other clients 2 participating in the group call. Command.

 また、通話品質制御部110は、あるクライアント2の通信が途絶した場合、即ち該クライアント2が弱電波になるなどして通信ができない状況になった時に、該クライアント2との通信が途絶したことをグループ通話に参加している他のクライアント2に対して通知する。通話品質制御部110は、グループ通話の制御を行っているVoIPサーバ11から各クライアント2の通信状況を所定の周期で取得することによりクライアント2の通信が途絶したことを検出するようにしても良い。通話品質制御部110は、通信が途絶したクライアント2との通信が回復したことを検出した場合には、その旨をグループ通話に参加している他のクライアント2に対して通知し、通信が回復したクライアント2をグループ通話に再度参加させるように制御する。 In addition, the communication quality control unit 110 indicates that communication with a client 2 is interrupted when communication with a client 2 is interrupted, that is, when the client 2 becomes unable to communicate due to weak radio waves. To other clients 2 participating in the group call. The call quality control unit 110 may detect that the communication of the client 2 has been interrupted by acquiring the communication status of each client 2 from the VoIP server 11 that controls the group call at a predetermined period. . When the call quality control unit 110 detects that the communication with the client 2 that has lost communication has been recovered, the call quality control unit 110 notifies the other clients 2 participating in the group call to that effect, and the communication is recovered. Then, the client 2 is controlled to rejoin the group call.

 クライアント管理部120は、グループ通話を行うクライアント2に係る情報であるクライアント情報を管理する機能手段である。クライアント管理部120が管理するクライアント情報には、少なくとも該クライアント情報に対応するクライアント2を一意に識別する識別情報を含み、更に、該クライアント情報に対応するクライアント2を有する利用者の名前などの情報や、該クライアント情報に対応するクライアント2の地理上の位置に係る情報を含むようにしても良い。クライアント管理部120は、一般的に提供されるサービスなどと同様に、クライアント2からのクライアント情報登録要求やクライアント情報要求、クライアント情報削除要求などを受けて、クライアント情報の登録、修正、削除などの処理を行うようにしても良い。 The client management unit 120 is a functional unit that manages client information that is information related to the client 2 that makes a group call. The client information managed by the client management unit 120 includes at least identification information for uniquely identifying the client 2 corresponding to the client information, and further information such as the name of the user having the client 2 corresponding to the client information Alternatively, information relating to the geographical position of the client 2 corresponding to the client information may be included. The client management unit 120 receives a client information registration request, a client information request, a client information deletion request, and the like from the client 2 in the same manner as a generally provided service, and performs client information registration, correction, deletion, etc. Processing may be performed.

 サーバ管理部130は、APIサーバ10から指令して制御することが可能なVoIPサーバ11を備えたサーバ1に係る情報であるサーバ情報を管理する機能手段である。サーバ管理部130が管理するサーバ情報には、少なくとも該サーバの地理上の位置と該サーバのネットワーク上の位置(IPアドレスなど)を含み、更に、該サーバが備えるVoIPサーバ11の稼働率、該サーバの管理者に係る情報などを含むようにしても良い。サーバ管理部130は、APIサーバ10の管理者によるサーバ情報登録操作、サーバ情報修正操作、サーバ情報削除操作などを受けて、サーバ情報の登録、修正、削除などの処理を行うようにしても良い。 The server management unit 130 is a functional unit that manages server information that is information related to the server 1 including the VoIP server 11 that can be commanded and controlled from the API server 10. The server information managed by the server management unit 130 includes at least the geographical location of the server and the location (IP address, etc.) of the server on the network, and the operation rate of the VoIP server 11 provided in the server, Information related to the server administrator may be included. The server management unit 130 may perform processing such as registration, correction, and deletion of server information in response to a server information registration operation, a server information correction operation, a server information deletion operation, and the like performed by the administrator of the API server 10. .

 通話グループ管理部140は、現在グループ通話を行っているクライアント2のグループ(以下、「クライアントグループ」と言う。)に係る情報である通話グループ情報を管理する機能手段である。通話グループ管理部140が管理する通話グループ情報は、少なくとも該通話グループ情報に対応するグループ通話に参加しているクライアント2を識別する情報(該クライアント2に係るクライアント情報に登録されている識別情報)、該通話グループ情報にグループ通話に用いられているVoIPサーバに係る情報、該通話グループ情報にグループ通話に参加しているそれぞれのクライアント2の通信状態(データ遅延状況、通信途絶状況など)を含む。通話グループ管理部140は、通話確立制御部100、通話品質制御部110からの通話グループ作成指令や通話グループ削除指令、通話グループ修正指令などを受けて、通話グループ情報の作成、修正、削除などの処理を行うようにしても良い。 The call group management unit 140 is a functional unit that manages call group information, which is information related to a group of clients 2 that are currently making a group call (hereinafter referred to as “client group”). The call group information managed by the call group management unit 140 includes at least information for identifying the client 2 participating in the group call corresponding to the call group information (identification information registered in the client information related to the client 2). The call group information includes information related to the VoIP server used for the group call, and the call group information includes the communication status (data delay status, communication disruption status, etc.) of each client 2 participating in the group call. . The call group management unit 140 receives a call group creation command, a call group deletion command, a call group correction command, etc. from the call establishment control unit 100 and the call quality control unit 110, and creates, corrects, deletes call group information, etc. Processing may be performed.

 上記した構成を備えた本実施形態のAPIサーバ10は、グループ通話に参加する各クライアント2の位置と、各VoIPサーバ11の稼働率とに基づいて、各クライアント2からのグループ通話要求を低遅延接続が可能なVoIPサーバ11へと振り分けることができる。また、本実施形態のAPIサーバ10は、各地域に設置されるVoIPサーバ11を介して、グループ通話を行う各クライアント2の死活状態を検知して、状況に応じたフェイルオーバ処理を行うため、利用者の手を煩わせることなく状況に応じた最適なグループ通話サービスを提供することができる。 The API server 10 according to the present embodiment having the above-described configuration is configured to reduce the group call request from each client 2 based on the position of each client 2 participating in the group call and the operating rate of each VoIP server 11 with low delay. The connection can be made to the VoIP server 11 that can be connected. In addition, the API server 10 of the present embodiment detects the alive state of each client 2 that makes a group call via the VoIP server 11 installed in each region, and performs failover processing according to the situation. It is possible to provide an optimal group call service according to the situation without bothering the user.

<3.クライアントの機能構成>
 図3は携帯通信端末20の概略的な機能構成を示す図である。携帯通信端末20は、グループ通話管理部201、グループ通話制御部202、ノイズ推定部203、発話候補判定部204、発話性判定部205、再生音声データ送信部207、通信部208、近距離無線通信部209を備える。これらの機能手段は、携帯通信端末20が備える記憶装置やI/O回路等をCPUが制御することにより実現される。
<3. Client functional configuration>
FIG. 3 is a diagram showing a schematic functional configuration of the mobile communication terminal 20. The mobile communication terminal 20 includes a group call management unit 201, a group call control unit 202, a noise estimation unit 203, an utterance candidate determination unit 204, an utterance determination unit 205, a reproduced audio data transmission unit 207, a communication unit 208, and short-range wireless communication. Part 209. These functional units are realized by the CPU controlling a storage device, an I / O circuit, and the like included in the mobile communication terminal 20.

 グループ通話管理部201は、APIサーバ10との間でグループ通話の管理に係る情報を通信部208を介してやり取りし、グループ通話の開始や終了などを管理する機能手段である。グループ通話管理部201は、APIサーバ10に対してグループ通話開始要求、クライアント追加要求、グループ通話終了要求などの各種要求を送信し、該要求に対するAPIサーバ10の応答に応じて後述するグループ通話制御部202に対して指令することによりグループ通話の管理を行う。 The group call management unit 201 is a functional unit that exchanges information related to group call management with the API server 10 via the communication unit 208 and manages the start and end of the group call. The group call management unit 201 transmits various requests such as a group call start request, a client addition request, and a group call end request to the API server 10 and performs group call control described later according to a response of the API server 10 to the request. The group call is managed by instructing the unit 202.

 グループ通話制御部202は、グループ通話管理部201からの指令に基づいてグループ通話に参加している他のクライアント2との間の音声データの送受信と、ヘッドセット21との間の音声データの送受信とを制御する機能手段である。グループ通話制御部202は、後述するノイズ推定部203、発話候補判定部204、発話性判定部205によりヘッドセット21から受信した利用者の発話にかかる音声データの発話検知と音声データのデータ品質制御を行う。 The group call control unit 202 transmits / receives audio data to / from other clients 2 participating in the group call based on an instruction from the group call management unit 201 and transmits / receives audio data to / from the headset 21. Is a functional means for controlling. The group call control unit 202 detects the utterance of the voice data related to the user's utterance received from the headset 21 by the noise estimation unit 203, the utterance candidate determination unit 204, and the utterance determination unit 205 described later, and controls the data quality of the voice data. I do.

 ノイズ推定部203は、ヘッドセット21から受信した利用者の発話にかかる音声データから平均環境音の推定を行う機能手段である。ヘッドセット21から受信した利用者の発話にかかる音声データには、利用者の発話と環境音とが含まれているがノイズ推定部203によるノイズ推定の方法としては、最小二乗誤差(MMSE)推定や最尤法、最大事後確率推定などの公知の方法を用いても良い。例えば、ノイズ推定部203は、サンプルフレーム毎の音声存在確率推定を元にして環境音のパワースペクトルをMMSE基準により逐次更新し、該環境音のパワースペクトルを用いて音声データの中からノイズである環境音を推定できるようにするようにしても良い。 The noise estimation unit 203 is a functional unit that estimates an average environmental sound from voice data related to a user's utterance received from the headset 21. The voice data concerning the user's utterance received from the headset 21 includes the user's utterance and the environmental sound. As a noise estimation method by the noise estimation unit 203, a least square error (MMSE) estimation is used. Alternatively, a known method such as maximum likelihood method or maximum posterior probability estimation may be used. For example, the noise estimation unit 203 sequentially updates the power spectrum of the environmental sound based on the MMSE standard based on the voice presence probability estimation for each sample frame, and is noise from the voice data using the power spectrum of the environmental sound. The environmental sound may be estimated.

 発話候補判定部204は、ノイズ推定部203によるノイズとなる環境音の推定結果に基づいて、音声データの中から平均環境音と異なる音を発話候補として判定する機能手段である。発話候補判定部204は、数フレーム単位の長時間スペクトル変動とノイズ推定部203が推定した環境音のパワースペクトルとを比較することで、非定常的な音声データの部分を利用者の発話による音声データであると判定する。 The utterance candidate determination unit 204 is a functional unit that determines a sound different from the average environmental sound from the sound data as an utterance candidate based on the estimation result of the environmental sound that becomes noise by the noise estimation unit 203. The utterance candidate determination unit 204 compares the long-term spectrum fluctuation in units of several frames with the power spectrum of the environmental sound estimated by the noise estimation unit 203, so that the part of the non-stationary audio data is converted into the voice generated by the user's utterance. Judged as data.

 発話性判定部205は、発話候補判定部204が利用者による発話による音声データであると判定した部分について、人間の声以外の突発的な環境音であると推定される音声データの部分を判定する機能手段である。発話性判定部205は、発話候補判定部204が利用者による発話による音声データであると判定した部分に対してスペクトル周期成分の含有比率推定などを行うことで、人間の喉などから発せられた音声に基づく音声データで有るかどうかを判定する。また、発話性判定部205は、音声波形からのエコーの度合いの推定による発話者との距離や直接波であるかどうかの評価を行い、発話者が発した音声に基づく音声データであるかどうかを判定する。 The utterance determination unit 205 determines a portion of audio data that is estimated to be an unexpected environmental sound other than a human voice with respect to a portion that the utterance candidate determination unit 204 has determined to be audio data generated by a user's utterance. Is a functional means. The utterance determination unit 205 is uttered from a human throat or the like by performing the estimation of the content ratio of the spectral period component for the portion determined by the utterance candidate determination unit 204 to be voice data by the utterance by the user. It is determined whether the voice data is based on voice. In addition, the utterance determination unit 205 evaluates the distance from the utterer or whether it is a direct wave by estimating the degree of echo from the speech waveform, and whether or not the speech data is based on the speech uttered by the speaker. Determine.

 音声データ送信部206は、発話候補判定部204が発話候補として判定された範囲から、発話性判定部205が突発的な環境音であると判定して部分を除いた範囲の音声データをエンコードしてVoIPサーバへと送信する。音声データ送信部206は、音声データをエンコードする際に、グループ通話制御部202がAPIサーバ10の通信品質制御部110からの指令に基づいて決定したエンコード方式と通信品質で音声データのエンコードを行う。 The audio data transmission unit 206 encodes the audio data in a range in which the utterance determination unit 205 determines that it is a sudden environmental sound from the range determined as the utterance candidate by the utterance candidate determination unit 204 and excludes the portion. To the VoIP server. When the audio data transmission unit 206 encodes the audio data, the group call control unit 202 encodes the audio data with the encoding method and communication quality determined based on the command from the communication quality control unit 110 of the API server 10. .

 再生音声データ送信部207は、通信部208を介してVoIPサーバから受信してデコードされた音声データを近距離無線通信部209を介してヘッドセット21へと送信する。 The reproduction voice data transmission unit 207 transmits the decoded voice data received from the VoIP server via the communication unit 208 to the headset 21 via the short-range wireless communication unit 209.

 通信部208は、モバイルネットワークを介した通信を制御する機能手段である。通信部208は、一般的な携帯通信網などに対する通信インタフェースを用いて実現される。
 近距離無線通信部209は、Bluetooth(登録商標)などの近距離無線通信を制御する機能手段である。近距離無線通信部209は、一般的な近距離無線通信インタフェースを用いて実現される。
The communication unit 208 is a functional unit that controls communication via the mobile network. The communication unit 208 is realized using a communication interface for a general mobile communication network or the like.
The near field communication unit 209 is a functional unit that controls near field communication such as Bluetooth (registered trademark). The short-range wireless communication unit 209 is realized using a general short-range wireless communication interface.

 図4はヘッドセット21の概略的な機能構成を示す図である。ヘッドセット21は、音声検知部211、発話強調部212、再生制御部213、近距離無線通信部216を備える。これらの機能手段は、ヘッドセット21が備える記憶装置やI/O回路等をCPUが制御することにより実現される。 FIG. 4 is a diagram showing a schematic functional configuration of the headset 21. The headset 21 includes a voice detection unit 211, a speech enhancement unit 212, a playback control unit 213, and a short-range wireless communication unit 216. These functional units are realized by the CPU controlling a storage device, an I / O circuit, and the like included in the headset 21.

 音声検知部211は、ヘッドセット21を装着した利用者の発話を検知して音声データへと変換する機能手段である。音声検知部211は、ヘッドセット21が備えるマイクロフォンとA/D変換回路、音声データのエンコーダなどにより構成される。音声検知部211を構成するマイクロフォンとして、少なくとも2つのマイクロフォンを備えていることが望ましい。 The voice detection unit 211 is a functional unit that detects the speech of the user wearing the headset 21 and converts it into voice data. The sound detection unit 211 includes a microphone, an A / D conversion circuit, an encoder for sound data, and the like included in the headset 21. It is desirable that at least two microphones are provided as microphones constituting the sound detection unit 211.

 発話強調部212は、音声検知部211が検知して変換された音声データの中からヘッドセット21を装着している利用者の発話を強調して検出できるようにする機能手段である。発話強調部212は、例えば公知のビームフォーミングアルゴリズムなどを利用して利用者の発話を環境音に対して相対的に強調する。発話強調部212が行う処理により、音声データに含まれる環境音が利用者の発話に対して相対的に抑制されるため、音質の向上と、後段の信号処理の性能と計算負荷を下げることが可能となる。発話強調部212により変換された音声データは、近距離無線通信部216を介して携帯通信端末20へと送信される。 The utterance emphasis unit 212 is a functional means that emphasizes and detects the utterance of the user wearing the headset 21 from the voice data detected and converted by the voice detection unit 211. The speech enhancement unit 212 emphasizes the user's speech relative to the environmental sound using, for example, a known beam forming algorithm. The processing performed by the speech emphasizing unit 212 suppresses the environmental sound included in the audio data relative to the user's speech, thereby improving sound quality and reducing the performance and calculation load of the subsequent signal processing. It becomes possible. The voice data converted by the speech emphasizing unit 212 is transmitted to the mobile communication terminal 20 via the short-range wireless communication unit 216.

 再生制御部213は、近距離無線通信部216を介して携帯通信端末20から受信した音声データを再生する機能手段である。再生制御部213は、ヘッドセット21が備える音声データのデコーダ、D/A変換回路、スピーカーなどにより構成される。再生制御部213は、携帯通信端末20から受信した音声データにおける発話区間における音声を再生する際には、ヘッドセット21が備えるマイクロフォンが検知した環境音を元に再生する音声データを利用者に聴取し易い形で再生する。再生制御部213は、音声検知部で推定した周辺騒音を元に、ノイズキャンセリング処理を実施して利用者が聴取する環境音を打ち消して再生音を聴取しやすくしてもよいし、周辺騒音の大きさに連動して再生音量を大きくする処理を実施して相対的に再生音を聴取しやすくしても良い。 The playback control unit 213 is a functional unit that plays back audio data received from the mobile communication terminal 20 via the short-range wireless communication unit 216. The reproduction control unit 213 includes an audio data decoder, a D / A conversion circuit, a speaker, and the like included in the headset 21. The reproduction control unit 213 listens to the user for audio data to be reproduced based on the environmental sound detected by the microphone included in the headset 21 when reproducing the audio in the speech period in the audio data received from the mobile communication terminal 20. Play in a form that is easy to do. The reproduction control unit 213 may perform noise canceling processing based on the ambient noise estimated by the voice detection unit to cancel the environmental sound heard by the user and make it easier to hear the reproduced sound. A process of increasing the playback volume in conjunction with the size of the sound may be performed to make it easier to hear the playback sound.

 上記した構成を備えた本実施形態のクライアント2は、発話と環境音に関する様々な推定処理を連関させた多面的な音声データ処理を行うことで、通信路に伝送される音声データのサイズを削減しながらも、クリアな発話再生ができるようになっている。これにより、クライアント2を構成する各機器における電力消費の省力化と、通話のUX(User Experience)の大幅な向上を実現することができる。 The client 2 of the present embodiment having the above-described configuration reduces the size of audio data transmitted to the communication path by performing multifaceted audio data processing that links various estimation processes related to speech and environmental sound. However, clear utterance playback is possible. Thereby, labor saving of power consumption in each device constituting the client 2 and a significant improvement in UX (User Experience) can be realized.

 以下では、上記構成を備えた通信システム300の特徴的な機能である発話検知機能、通信制御機能、音声再生制御機能について、動作の流れを示すシーケンスチャートを用いて説明する。
<4.発話検知機能>
 図5は、発話検知機能に係るヘッドセット及び携帯通信端末上で実行される処理の流れを示すシーケンスチャートである。
●[ステップSA01]音声検知部211は、環境音を含む利用者の発話を音声として検知して音声データへと変換する。
●[ステップSA02]発話強調部212は、ステップSA01で変換された音声データに含まれる利用者の発話音声を環境音に対して相対的に強調する。
●[ステップSA03]近距離無線通信部216は、ステップSA02で変換された音声データを第1携帯通信端末20へと送信する。
Hereinafter, the speech detection function, the communication control function, and the voice reproduction control function, which are characteristic functions of the communication system 300 having the above-described configuration, will be described using a sequence chart showing an operation flow.
<4. Utterance detection function>
FIG. 5 is a sequence chart showing a flow of processing executed on the headset and the mobile communication terminal related to the speech detection function.
[Step SA01] The voice detection unit 211 detects a user's utterance including an environmental sound as a voice and converts it into voice data.
[Step SA02] The utterance emphasizing unit 212 emphasizes the utterance voice of the user included in the voice data converted in step SA01 relative to the environmental sound.
[Step SA03] The short-range wireless communication unit 216 transmits the audio data converted in Step SA02 to the first mobile communication terminal 20.

●[ステップSA04]ノイズ推定部203は、第1ヘッドセットから受信された音声データを解析して、音声データに含まれるノイズである環境音を推定する。
●[ステップSA05]発話候補判定部204は、ステップSA04におけるノイズ推定部203によるノイズとなる環境音の推定結果に基づいて、音声データの中から平均環境音と異なる音を発話候補として判定する。
●[ステップSA06]発話性判定部205は、ステップSA05で発話候補判定部204が利用者による発話候補である判定した音声データの部分について、突発的な環境音や、ヘッドセットのマイクロフォンから距離のある位置から発せられた発話であると推定される音声データの部分を判定する。
●[ステップSA07]グループ通話制御部202は、ステップSA05で発話候補として判定された範囲から、ステップSA06で発話性判定部205が突発的な環境音やヘッドセットのマイクロフォンから離れた位置から発せられた発話であると判定して部分を除いた範囲の音声データを対象として、VoIPサーバ11とのやり取りで決定されたエンコード方式と通信品質で音声データのエンコードを行い、エンコードした音声データをVoIPサーバへと送信する。
[Step SA04] The noise estimation unit 203 analyzes the audio data received from the first headset, and estimates the environmental sound that is noise included in the audio data.
[Step SA05] The utterance candidate determination unit 204 determines a sound different from the average environmental sound from the speech data as the utterance candidate based on the estimation result of the environmental sound that becomes noise by the noise estimation unit 203 in Step SA04.
[Step SA06] The utterance determination unit 205 determines the distance of the voice data determined by the utterance candidate determination unit 204 that is the utterance candidate by the user in step SA05 from the sudden environmental sound or the microphone of the headset. The part of the speech data that is estimated to be uttered from a certain position is determined.
[Step SA07] In the group call control unit 202, the utterance determination unit 205 is emitted from a position away from the sudden environmental sound or the headset microphone in step SA06 from the range determined as the utterance candidate in step SA05. The speech data is encoded with the encoding method and the communication quality determined by the exchange with the VoIP server 11 for the speech data in the range excluding the portion determined to be an utterance, and the encoded speech data is converted to the VoIP server Send to.

 図6は、図5のシーケンスチャートに従って検知された音声から送信される音声データが生成されるまでの変換のイメージを示す図である。図6に示すように、本発明の通信システムでは、検知された音声の内で発話の再現に必要となる部分のみが抽出されるため、エンコードされてVoIPサーバ11に送信される音声データは通常の通信システムにおいて送信される音声データと比較してサイズを小さくすることができる。 FIG. 6 is a diagram showing an image of conversion until sound data transmitted from sound detected according to the sequence chart of FIG. 5 is generated. As shown in FIG. 6, in the communication system of the present invention, only the portion necessary for reproducing the utterance is extracted from the detected voice, so that the voice data encoded and transmitted to the VoIP server 11 is usually Compared with audio data transmitted in the communication system, the size can be reduced.

<5.音声再生制御機能>
 図7は、音声再生制御機能に係るヘッドセット及び携帯通信端末上で実行される処理の流れを示すシーケンスチャートである。
●[ステップSB01]グループ通話制御部202は、VoIPサーバ11とのやり取りで決定されたエンコード方式により受信したデータを音声データへとデコードする。
●[ステップSB02]再生音声データ送信部207は、ステップSB02でデコードされた音声データを第2ヘッドセット21へと送信する。
<5. Audio playback control function>
FIG. 7 is a sequence chart showing a flow of processing executed on the headset and the mobile communication terminal related to the audio reproduction control function.
[Step SB01] The group call control unit 202 decodes the data received by the encoding method determined by the exchange with the VoIP server 11 into voice data.
[Step SB02] The reproduction audio data transmission unit 207 transmits the audio data decoded in step SB02 to the second headset 21.

●[ステップSB03]音声検知部211は、環境音を音声として検知して音声データへと変換する。
●[ステップSB04]再生制御部213は、第2携帯通信端末から受信した音声データを、音声データの発話区間においてステップSB03で検知した環境音に対して再生音を聴取しやすくする処理を行いながら再生する。
[Step SB03] The sound detection unit 211 detects environmental sound as sound and converts it into sound data.
[Step SB04] The playback control unit 213 performs processing for making it easier to hear the playback sound of the sound data received from the second mobile communication terminal with respect to the environmental sound detected in Step SB03 in the speech section of the sound data. Reproduce.

<6.通信制御機能>
 図8は、データ転送遅延が発生した時の通信制御機能に係るAPIサーバ、VoIPサーバ及び携帯通信端末上で実行される処理の流れを示すシーケンスチャートである。
●[ステップSC01]VoIPサーバ11は、第2携帯通信端末20のデータ転送遅延を検出する。
●[ステップSC02]VoIPサーバ11は、第2携帯通信端末20のデータ転送遅延状況をAPIサーバ10へと通知する。
<6. Communication control function>
FIG. 8 is a sequence chart showing a flow of processing executed on the API server, the VoIP server, and the mobile communication terminal related to the communication control function when a data transfer delay occurs.
[Step SC01] The VoIP server 11 detects the data transfer delay of the second mobile communication terminal 20.
[Step SC02] The VoIP server 11 notifies the API server 10 of the data transfer delay status of the second mobile communication terminal 20.

●[ステップSC03]通信品質制御部110は、VoIPサーバ11から通知された第2携帯通信端末20のデータ転送遅延状況に応じた通信品質を決定し、決定した通信品質にするようにVoIPサーバ11と、第2携帯通信端末20と同じクライアントグループに所属する第1携帯通信端末20とに指令する。
●[ステップSC04]VoIPサーバ11は、第2携帯通信端末20が所属するクライアントグループの通信品質を、ステップSC03で指令された通信品質へと変更する。
●[ステップSC05]第1携帯通信端末20は、通信品質をステップSC03で指令された通信品質へと変更する。
[Step SC03] The communication quality control unit 110 determines the communication quality according to the data transfer delay state of the second mobile communication terminal 20 notified from the VoIP server 11, and sets the determined communication quality to the VoIP server 11 And the first mobile communication terminal 20 belonging to the same client group as the second mobile communication terminal 20.
[Step SC04] The VoIP server 11 changes the communication quality of the client group to which the second mobile communication terminal 20 belongs to the communication quality commanded in Step SC03.
[Step SC05] The first mobile communication terminal 20 changes the communication quality to the communication quality commanded in Step SC03.

 図9は、データ転送状況が回復した時の通信制御機能に係るAPIサーバ、VoIPサーバ及び携帯通信端末上で実行される処理の流れを示すシーケンスチャートである。
●[ステップSD01]VoIPサーバ11は、第2携帯通信端末20のデータ転送状況の回復を検出する。
●[ステップSD02]VoIPサーバ11は、第2携帯通信端末20のデータ転送状況の回復をAPIサーバ10へと通知する。
FIG. 9 is a sequence chart showing a flow of processing executed on the API server, the VoIP server, and the mobile communication terminal related to the communication control function when the data transfer state is recovered.
[Step SD01] The VoIP server 11 detects the recovery of the data transfer status of the second mobile communication terminal 20.
[Step SD02] The VoIP server 11 notifies the API server 10 of the recovery of the data transfer status of the second mobile communication terminal 20.

●[ステップSD03]通信品質制御部110は、VoIPサーバ11から通知された第2携帯通信端末20のデータ転送状況の回復に応じて、通信品質を回復するようにVoIPサーバ11と、第2携帯通信端末20と同じクライアントグループに所属する第1携帯通信端末20とに指令する。
●[ステップSD04]VoIPサーバ11は、第2携帯通信端末20が所属するクライアントグループの通信品質を回復する。
●[ステップSD05]第1携帯通信端末20は、通信品質を回復する。
[Step SD03] The communication quality control unit 110 receives the VoIP server 11 and the second mobile phone so as to recover the communication quality in response to the recovery of the data transfer status of the second mobile communication terminal 20 notified from the VoIP server 11. Commands the first mobile communication terminal 20 belonging to the same client group as the communication terminal 20.
[Step SD04] The VoIP server 11 recovers the communication quality of the client group to which the second mobile communication terminal 20 belongs.
[Step SD05] The first mobile communication terminal 20 recovers the communication quality.

 図10は、通信途絶が発生した時の通信制御機能に係るAPIサーバ、VoIPサーバ及び携帯通信端末上で実行される処理の流れを示すシーケンスチャートである。
●[ステップSE01]VoIPサーバ11は、第2携帯通信端末20との通信が途絶したことを検出する。
●[ステップSE02]VoIPサーバ11は、第2携帯通信端末20の通信途絶をAPIサーバ10へと通知する。
FIG. 10 is a sequence chart showing a flow of processing executed on the API server, the VoIP server, and the mobile communication terminal related to the communication control function when communication interruption occurs.
[Step SE01] The VoIP server 11 detects that communication with the second mobile communication terminal 20 has been interrupted.
[Step SE02] The VoIP server 11 notifies the API server 10 of the communication interruption of the second mobile communication terminal 20.

●[ステップSE03]通信品質制御部110は、第2携帯通信端末20と同じクライアントグループに所属する第1携帯通信端末20に対して第2携帯通信端末20との通信が途絶したことを通知する。
●[ステップSE04]VoIPサーバ11は、第2携帯通信端末20の通信状態に係る情報を通信途絶状態へと変更する。
●[ステップSE05]第1携帯通信端末20は、第2携帯通信端末20の通信状態に係る情報を通信途絶状態へと変更する。
[Step SE03] The communication quality control unit 110 notifies the first mobile communication terminal 20 belonging to the same client group as the second mobile communication terminal 20 that communication with the second mobile communication terminal 20 has been interrupted. .
[Step SE04] The VoIP server 11 changes the information related to the communication state of the second mobile communication terminal 20 to the communication interruption state.
[Step SE05] The first mobile communication terminal 20 changes the information related to the communication state of the second mobile communication terminal 20 to a communication disruption state.

 図11は、通信途絶が発生した時の通信制御機能に係るAPIサーバ、VoIPサーバ及び携帯通信端末上で実行される処理の流れを示すシーケンスチャートである。
●[ステップSF01]VoIPサーバ11は、第2携帯通信端末20の通信状況が回復したことを検出する。
●[ステップSE02]VoIPサーバ11は、第2携帯通信端末20の通信状況回復をAPIサーバ10へと通知する。
FIG. 11 is a sequence chart showing a flow of processing executed on the API server, the VoIP server, and the mobile communication terminal related to the communication control function when communication interruption occurs.
[Step SF01] The VoIP server 11 detects that the communication status of the second mobile communication terminal 20 has recovered.
[Step SE02] The VoIP server 11 notifies the API server 10 of the recovery of the communication status of the second mobile communication terminal 20.

●[ステップSE03]通信品質制御部110は、第2携帯通信端末20と同じクライアントグループに所属する第1携帯通信端末20に対して第2携帯通信端末20との通信が回復したことを通知する。
●[ステップSE04]VoIPサーバ11は、第2携帯通信端末20の通信状態に係る情報を通常状態へと変更する。
●[ステップSE05]第1携帯通信端末20は、第2携帯通信端末20の通信状態に係る情報を通常状態へと変更する。
[Step SE03] The communication quality control unit 110 notifies the first mobile communication terminal 20 belonging to the same client group as the second mobile communication terminal 20 that the communication with the second mobile communication terminal 20 has been recovered. .
[Step SE04] The VoIP server 11 changes the information related to the communication state of the second mobile communication terminal 20 to the normal state.
[Step SE05] The first mobile communication terminal 20 changes the information related to the communication state of the second mobile communication terminal 20 to the normal state.

 以上、本発明の実施の形態について説明したが、本発明は上述した実施の形態の例のみに限定されることなく、適宜の変更を加えることにより様々な態様で実施することができる。 As mentioned above, although embodiment of this invention was described, this invention is not limited only to the example of embodiment mentioned above, It can implement in a various aspect by adding an appropriate change.

  1 サーバ
  2 クライアント
  10 APIサーバ
  11 VoIPサーバ
  20 携帯通信端末
  21 ヘッドセット
  100 通話確立制御部
  110 通話品質制御部
  120 クライアント管理部
  130 サーバ管理部
  140 通話グループ管理部
  201 グループ通話管理部
  202 グループ通話制御部
  203 ノイズ推定部
  204 発話候補判定部
  205 発話性判定部
  206 音声データ送信部
  207 再生音声データ送信部
  208 通信部
  209 近距離無線通信部
  211 音声検知部
  212 発話強調部
  213 再生制御部
  216 近距離無線通信部
  300 通信システム
DESCRIPTION OF SYMBOLS 1 Server 2 Client 10 API server 11 VoIP server 20 Portable communication terminal 21 Headset 100 Call establishment control part 110 Call quality control part 120 Client management part 130 Server management part 140 Call group management part 201 Group call management part 202 Group call control part 203 noise estimation unit 204 utterance candidate determination unit 205 utterance determination unit 206 voice data transmission unit 207 reproduction voice data transmission unit 208 communication unit 209 short-range wireless communication unit 211 voice detection unit 212 utterance enhancement unit 213 reproduction control unit 216 short-range wireless Communication unit 300 Communication system

Claims (4)

 複数のクライアントの間でVoIPサーバを介したグループ通話を行う通信システムにおいて、
前記グループ通話を管理するAPIサーバを備え、
前記複数のクライアントの内の各クライアントはそれぞれ携帯通信網を介して通信する携帯通信端末と、該携帯通信端末との間で近距離無線通信により音声データをやり取りするヘッドセットとを備え、
前記ヘッドセットは、
音声を検知する音声検知部と、
前記音声検知部が検知した音声に含まれる発話部分を環境音に対して相対的に強調する発話強調部と、
前記携帯通信端末から受信した音声データを、該音声データの発話部分において前記音声検知部が検知した音声を周辺騒音に対して相対的に聞き取りやすくなるように再生する再生制御部と、
を備え、
前記携帯通信端末は、
前記ヘッドセットから受信した音声データに含まれるノイズを推定するノイズ推定部と、
前記ノイズ推定部による推定の結果に基づいて、前記音声データから発話部分の候補となる範囲を判定する発話候補判定部と、
前記発話候補判定部が判定した前記音声データの発話部分の候補となる範囲から人間の音声である部分を判定する発話性判定部と、
前記発話性判定部により人間の音声であると判定された前記音声データの部分を前記VoIPサーバへと送信する音声データ送信部と、
前記VoIPサーバから受信した音声データを前記ヘッドセットへと送信する再生音声データ送信部と、
を備え、
前記APIサーバは、
前記各クライアントと、前記VoIPサーバとの通信状況に基づいて、前記グループ通話の通信品質の制御に係る指令を前記各クライアント及び前記VoIPサーバへと通知する通信品質制御部を備え、
前記音声データ送信部は、前記通信品質制御部から通知された指令に基づいた通信品質で前記発話性判定部により人間の音声であると判定された前記音声データの部分をエンコードして前記VoIPサーバへと送信する、
ことを特徴とする通信システム。
In a communication system for performing a group call between a plurality of clients via a VoIP server,
An API server for managing the group call;
Each of the plurality of clients includes a mobile communication terminal that communicates via a mobile communication network, and a headset that exchanges voice data with the mobile communication terminal by short-range wireless communication,
The headset is
A voice detection unit for detecting voice;
An utterance emphasizing unit that emphasizes an utterance part included in the voice detected by the voice detecting unit relative to the environmental sound;
A reproduction control unit for reproducing the audio data received from the mobile communication terminal so that the audio detected by the audio detection unit in the utterance portion of the audio data is relatively easy to hear with respect to ambient noise;
With
The mobile communication terminal is
A noise estimator for estimating noise included in audio data received from the headset;
Based on the estimation result by the noise estimation unit, an utterance candidate determination unit that determines a range of utterance part candidates from the voice data;
An utterance determination unit that determines a portion that is human speech from a range that is a candidate for an utterance portion of the voice data determined by the utterance candidate determination unit;
A voice data transmission unit that transmits a portion of the voice data determined to be human voice by the speech determination unit to the VoIP server;
A reproduction audio data transmission unit for transmitting audio data received from the VoIP server to the headset;
With
The API server
A communication quality control unit for notifying each client and the VoIP server of a command related to the control of the communication quality of the group call based on the communication status between the clients and the VoIP server;
The voice data transmitting unit encodes the voice data portion determined to be human voice by the speech determination unit with communication quality based on a command notified from the communication quality control unit, and the VoIP server Send to
A communication system characterized by the above.
 請求項1に記載の通信システムで行われるグループ通話を管理するAPIサーバであって、
前記各クライアントと、前記VoIPサーバとの通信状況に基づいて、前記グループ通話の通信品質の制御に係る指令を前記各クライアント及び前記VoIPサーバへと通知する通信品質制御部を備えたAPIサーバ。
An API server that manages group calls performed in the communication system according to claim 1,
An API server comprising a communication quality control unit for notifying each client and the VoIP server of a command related to control of the communication quality of the group call based on a communication status between each of the clients and the VoIP server.
 請求項1に記載の通信システムに用いられるヘッドセットであって、
音声を検知する音声検知部と、
前記音声検知部が検知した音声に含まれる発話部分を環境音に対して相対的に強調する発話強調部と、
前記携帯通信端末から受信した音声データを、該音声データの発話部分において前記音声検知部が検知した音声をノイズキャンセリングしながら再生する再生制御部と、
を備えたヘッドセット。
A headset used in the communication system according to claim 1,
A voice detection unit for detecting voice;
An utterance emphasizing unit that emphasizes an utterance part included in the voice detected by the voice detecting unit relative to the environmental sound;
A reproduction control unit that reproduces audio data received from the mobile communication terminal while noise canceling the audio detected by the audio detection unit in the utterance portion of the audio data;
With headset.
 請求項1に記載の通信システムに用いられる携帯通信端末であって、
前記ヘッドセットから受信した音声データに含まれるノイズを推定するノイズ推定部と、
前記ノイズ推定部による推定の結果に基づいて、前記音声データから発話部分の候補となる範囲を判定する発話候補判定部と、
前記発話候補判定部が判定した前記音声データの発話部分の候補となる範囲から人間の音声である部分を判定する発話性判定部と、
前記発話性判定部により人間の音声であると判定された前記音声データの部分を前記VoIPサーバへと送信する音声データ送信部と、
前記VoIPサーバから受信した音声データを前記ヘッドセットへと送信する再生音声データ送信部と、
を備えた携帯通信端末。
A mobile communication terminal used in the communication system according to claim 1,
A noise estimator for estimating noise included in audio data received from the headset;
Based on the estimation result by the noise estimation unit, an utterance candidate determination unit that determines a range of utterance part candidates from the voice data;
An utterance determination unit that determines a portion that is human speech from a range that is a candidate for an utterance portion of the voice data determined by the utterance candidate determination unit;
A voice data transmission unit that transmits a portion of the voice data determined to be human voice by the speech determination unit to the VoIP server;
A reproduction audio data transmission unit for transmitting audio data received from the VoIP server to the headset;
Mobile communication terminal equipped with.
PCT/JP2017/009756 2017-03-10 2017-03-10 Communication system, api server used in communication system, headset, and portable communication terminal Ceased WO2018163418A1 (en)

Priority Applications (11)

Application Number Priority Date Filing Date Title
PCT/JP2017/009756 WO2018163418A1 (en) 2017-03-10 2017-03-10 Communication system, api server used in communication system, headset, and portable communication terminal
US16/490,766 US20200028955A1 (en) 2017-03-10 2018-03-07 Communication system and api server, headset, and mobile communication terminal used in communication system
CN201880015280.XA CN110663244B (en) 2017-03-10 2018-03-07 A communication system and portable communication terminal
EP23183175.1A EP4239992A3 (en) 2017-03-10 2018-03-07 Communication system and mobile communication terminal
JP2018526268A JP6416446B1 (en) 2017-03-10 2018-03-07 Communication system, API server used in communication system, headset, and portable communication terminal
CN202110473317.7A CN113114866A (en) 2017-03-10 2018-03-07 Portable communication terminal, control method thereof, communication system, and recording medium
PCT/JP2018/008697 WO2018164165A1 (en) 2017-03-10 2018-03-07 Communication system and api server, headset, and mobile communication terminal used in communication system
EP18764411.7A EP3595278B1 (en) 2017-03-10 2018-03-07 Communication system and mobile communication terminal
JP2018187677A JP6742640B2 (en) 2017-03-10 2018-10-02 Mobile communication terminal, program, and control method for mobile communication terminal
JP2018187678A JP6815654B2 (en) 2017-03-10 2018-10-02 Communication systems, programs, and methods of controlling communication systems
JP2020207754A JP7219492B2 (en) 2017-03-10 2020-12-15 Communication system, program, and control method for communication system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2017/009756 WO2018163418A1 (en) 2017-03-10 2017-03-10 Communication system, api server used in communication system, headset, and portable communication terminal

Publications (1)

Publication Number Publication Date
WO2018163418A1 true WO2018163418A1 (en) 2018-09-13

Family

ID=63447456

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/009756 Ceased WO2018163418A1 (en) 2017-03-10 2017-03-10 Communication system, api server used in communication system, headset, and portable communication terminal

Country Status (1)

Country Link
WO (1) WO2018163418A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007300243A (en) * 2006-04-27 2007-11-15 Kyocera Corp Group call notification method and mobile station
JP2007318740A (en) * 2006-04-24 2007-12-06 Fujitsu Ltd Response support method, response support system, response support device, and computer program
JP2010050695A (en) * 2008-08-21 2010-03-04 Nittetsu Elex Co Ltd Communication system
JP2011097268A (en) * 2009-10-28 2011-05-12 Sony Corp Playback device, headphone, and playback method
JP2016189121A (en) * 2015-03-30 2016-11-04 ソニー株式会社 Information processing device, information processing method, and program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007318740A (en) * 2006-04-24 2007-12-06 Fujitsu Ltd Response support method, response support system, response support device, and computer program
JP2007300243A (en) * 2006-04-27 2007-11-15 Kyocera Corp Group call notification method and mobile station
JP2010050695A (en) * 2008-08-21 2010-03-04 Nittetsu Elex Co Ltd Communication system
JP2011097268A (en) * 2009-10-28 2011-05-12 Sony Corp Playback device, headphone, and playback method
JP2016189121A (en) * 2015-03-30 2016-11-04 ソニー株式会社 Information processing device, information processing method, and program

Similar Documents

Publication Publication Date Title
JP7219492B2 (en) Communication system, program, and control method for communication system
US20160080433A1 (en) Remote Conference Implementation Method and Apparatus
US9749474B2 (en) Matching reverberation in teleconferencing environments
US20070237339A1 (en) Environmental noise reduction and cancellation for a voice over internet packets (VOIP) communication device
CN108886647A (en) Earphone noise-reduction method and device, main earphone, from earphone and earphone noise reduction system
CN102800323A (en) Method and device for reducing noises of voice of mobile terminal
US12477069B2 (en) Audio data processing method and apparatus, device, and storage medium
CN111951813A (en) Voice coding control method, device and storage medium
US8553520B2 (en) System and method for echo suppression in web browser-based communication
KR101592422B1 (en) Earset and control method for the same
KR20100030550A (en) Sharing of electromagnetic-signal measurements for providing feedback about transmit-path signal quality
CN109195043B (en) Method for improving noise reduction amount of wireless double-Bluetooth headset
CN104992711A (en) Local area network cluster duplexing speech communication method based on mobile terminal
US20070291693A1 (en) Selective control of audio quality on a mobile station
KR102842005B1 (en) Methods, systems and hearing devices for enhancing environmental audio signals of hearing devices
CN106656274B (en) Voice transmission system
WO2018163418A1 (en) Communication system, api server used in communication system, headset, and portable communication terminal
US20220360617A1 (en) Transmission of a representation of a speech signal
EP3014833B1 (en) Methods, network nodes, computer programs and computer program products for managing processing of an audio stream
JP2022177966A (en) Headset and earpiece
GB2381702A (en) Conference system employing discontinuous transmission and means to suppress silence descriptor frames
KR101729246B1 (en) System for providing video and voice sharing services using wearable device
CN116132958A (en) Optimization method based on Bluetooth hfp coding selection
JP6529473B2 (en) Wireless communication apparatus, wireless communication system, and noise reduction method
WO2024211705A1 (en) Earbuds - two microphone support over ble for binaural audio recording and stereo voice call

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17899862

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29/11/2019)

NENP Non-entry into the national phase

Ref country code: JP

NENP Non-entry into the national phase

Ref country code: JP

122 Ep: pct application non-entry in european phase

Ref document number: 17899862

Country of ref document: EP

Kind code of ref document: A1