[go: up one dir, main page]

WO2019142231A1 - Dispositif d'analyse vocale, procédé d'analyse vocale, programme d'analyse vocale et système d'analyse vocale - Google Patents

Dispositif d'analyse vocale, procédé d'analyse vocale, programme d'analyse vocale et système d'analyse vocale Download PDF

Info

Publication number
WO2019142231A1
WO2019142231A1 PCT/JP2018/000942 JP2018000942W WO2019142231A1 WO 2019142231 A1 WO2019142231 A1 WO 2019142231A1 JP 2018000942 W JP2018000942 W JP 2018000942W WO 2019142231 A1 WO2019142231 A1 WO 2019142231A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
section
unit
amount
participants
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2018/000942
Other languages
English (en)
Japanese (ja)
Inventor
武志 水本
哲也 菅原
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hylable Inc
Original Assignee
Hylable Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hylable Inc filed Critical Hylable Inc
Priority to PCT/JP2018/000942 priority Critical patent/WO2019142231A1/fr
Priority to JP2018502279A priority patent/JP6589040B1/ja
Publication of WO2019142231A1 publication Critical patent/WO2019142231A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • the present invention relates to a voice analysis device for analyzing voice, a voice analysis method, a voice analysis program and a voice analysis system.
  • the Harkness method (also referred to as the Harkness method) is known as a method for analyzing discussions in group learning and meetings (see, for example, Non-Patent Document 1).
  • the Harkness method the transition of each participant's utterance is recorded in a line. In this way, it is possible to analyze the contribution of each participant to the discussion and the relationship with others.
  • the Harkness Law can also be effectively applied to active learning where students take the initiative in learning.
  • Harkness method shows the tendency of the speech of the whole period from the start to the end of the discussion, it can not show the change of the speech volume of each participant along the time series. Therefore, there is a problem that it is difficult to analyze based on the time change of the volume of each participant.
  • the present invention has been made in view of these points, and a speech analysis device, a speech analysis method, a speech analysis program, and a speech analysis that can output information for performing an analysis based on a time change of a participant's speech volume in a discussion It aims to provide a system.
  • a voice analysis device including: an acquisition unit for acquiring voices uttered by a plurality of participants; and an analysis unit for identifying an utterance amount of each of the plurality of participants in the voice.
  • a section setting unit for setting a section in the voice based on an input from the user, a graph in which temporal changes of the utterance amount of the plurality of participants are accumulated, and information indicating the section in the graph
  • an output unit for outputting.
  • the output unit may output, as the information indicating the section, a position on the graph corresponding to a time when switching between the two sections.
  • the section setting unit is configured to set the section based on at least one of an operation in a communication terminal that communicates with the voice analysis device, an operation in a sound collection device for obtaining the voice, and a predetermined sound included in the voice.
  • a section may be set.
  • the output unit may output the graph in which temporal changes in the amount of utterance are stacked in ascending order of the degree of variation in the amount of utterance calculated for each of the plurality of participants.
  • the output unit outputs the graph in which temporal changes of the utterance amount are accumulated for each of the sections in ascending order of the variation degree of the utterance amount for each of the sections calculated for each of the plurality of participants. It is also good.
  • the output unit may output a plurality of graphs of the same section set to a plurality of the voices.
  • information indicating an event that has occurred within the time of the voice may be output on the graph.
  • the analysis unit may specify a value obtained by dividing the length of time during which a participant speaks within a predetermined time window by the length of the time window as the speech amount.
  • the processor acquires a speech uttered by a plurality of participants, and a step of specifying an amount of speech of each of the plurality of participants in the speech per hour A step of setting a section in the voice based on an input from the user, a graph in which temporal changes in the amount of utterance of the plurality of participants are accumulated, and information indicating the section in the graph And the following steps:
  • a voice analysis program includes the steps of: obtaining voices uttered by a plurality of participants on a computer; and identifying an amount of time of each of the plurality of participants in the voice.
  • a voice analysis system includes a voice analysis device and a communication terminal capable of communicating with the voice analysis device, the communication terminal having a display unit for displaying information, the voice
  • the analysis apparatus is based on an acquisition unit that acquires voices uttered by a plurality of participants, an analysis unit that identifies an amount of speech of each of the plurality of participants in the voice, and an input from a user.
  • An output unit that causes the display unit to display a section setting unit that sets a section in the voice, a graph in which temporal changes of the utterance amount of the plurality of participants are accumulated, and information indicating the section in the graph And.
  • FIG. 1 is a schematic view of a speech analysis system S according to the present embodiment.
  • the voice analysis system S includes a voice analysis device 100, a sound collection device 10, and a communication terminal 20.
  • the number of sound collectors 10 and communication terminals 20 included in the speech analysis system S is not limited.
  • the voice analysis system S may include devices such as other servers and terminals.
  • the voice analysis device 100, the sound collection device 10, and the communication terminal 20 are connected via a network N such as a local area network or the Internet. At least a part of the voice analysis device 100, the sound collection device 10, and the communication terminal 20 may be directly connected without the network N.
  • the sound collector 10 includes a microphone array including a plurality of sound collectors (microphones) arranged in different orientations.
  • the microphone array includes eight microphones equally spaced on the same circumference in the horizontal plane with respect to the ground.
  • the sound collection device 10 transmits the voice acquired using the microphone array to the voice analysis device 100 as data.
  • the communication terminal 20 is a communication device capable of performing wired or wireless communication.
  • the communication terminal 20 is, for example, a portable terminal such as a smart phone terminal or a computer terminal such as a personal computer.
  • the communication terminal 20 receives the setting of analysis conditions from the analyst and displays the analysis result by the voice analysis device 100.
  • the voice analysis device 100 is a computer that analyzes the voice acquired by the sound collection device 10 by a voice analysis method described later. Further, the voice analysis device 100 transmits the result of the voice analysis to the communication terminal 20.
  • FIG. 2 is a block diagram of the speech analysis system S according to the present embodiment. Arrows in FIG. 2 indicate the main data flow, and there may be data flows not shown in FIG. In FIG. 2, each block is not a hardware (apparatus) unit configuration but a function unit configuration. As such, the blocks shown in FIG. 2 may be implemented in a single device or may be implemented separately in multiple devices. Transfer of data between the blocks may be performed via any means such as a data bus, a network, a portable storage medium, and the like.
  • the communication terminal 20 has a display unit 21 for displaying various information, and an operation unit 22 for receiving an operation by an analyst.
  • the display unit 21 includes a display device such as a liquid crystal display or an organic light emitting diode (OLED) display.
  • the operation unit 22 includes operation members such as a button, a switch, and a dial.
  • the display unit 21 and the operation unit 22 may be integrally configured by using a touch screen capable of detecting the position of contact by the analyst as the display unit 21.
  • the voice analysis device 100 includes a control unit 110, a communication unit 120, and a storage unit 130.
  • the control unit 110 includes a setting unit 111, a sound acquisition unit 112, a sound source localization unit 113, an analysis unit 114, a section setting unit 115, and an output unit 116.
  • the storage unit 130 includes a setting information storage unit 131, a voice storage unit 132, and an analysis result storage unit 133.
  • the communication unit 120 is a communication interface for communicating with the sound collection device 10 and the communication terminal 20 via the network N.
  • the communication unit 120 includes a processor for performing communication, a connector, an electric circuit, and the like.
  • the communication unit 120 performs predetermined processing on a communication signal received from the outside to acquire data, and inputs the acquired data to the control unit 110. Further, the communication unit 120 performs predetermined processing on the data input from the control unit 110 to generate a communication signal, and transmits the generated communication signal to the outside.
  • the storage unit 130 is a storage medium including a read only memory (ROM), a random access memory (RAM), a hard disk drive, and the like.
  • the storage unit 130 stores in advance a program to be executed by the control unit 110.
  • the storage unit 130 may be provided outside the voice analysis device 100, and in this case, data may be exchanged with the control unit 110 via the communication unit 120.
  • the setting information storage unit 131 stores setting information indicating analysis conditions set by the analyst in the communication terminal 20.
  • the voice storage unit 132 stores the voice acquired by the sound collection device 10.
  • the analysis result storage unit 133 stores an analysis result indicating the result of analyzing the voice.
  • the setting information storage unit 131, the voice storage unit 132, and the analysis result storage unit 133 may be storage areas on the storage unit 130, or a database configured on the storage unit 130.
  • the control unit 110 is a processor such as a central processing unit (CPU), for example, and executes the program stored in the storage unit 130 to set the setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, the analysis unit 114, It functions as a section setting unit 115 and an output unit 116.
  • the functions of the setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, the analysis unit 114, the section setting unit 115, and the output unit 116 will be described later with reference to FIGS. 3 to 9.
  • At least a part of the functions of the control unit 110 may be performed by an electrical circuit.
  • at least a part of the functions of the control unit 110 may be executed by a program executed via a network.
  • the speech analysis system S is not limited to the specific configuration shown in FIG.
  • the voice analysis device 100 is not limited to one device, and may be configured by connecting two or more physically separated devices in a wired or wireless manner.
  • FIG. 3 is a schematic view of the speech analysis method performed by the speech analysis system S according to the present embodiment.
  • the analyst sets the analysis conditions by operating the operation unit 22 of the communication terminal 20.
  • the analysis condition is information indicating the number of participants in the argument to be analyzed and the direction in which each participant (that is, each of a plurality of participants) is located with reference to the sound collection device 10.
  • the communication terminal 20 receives the setting of analysis conditions from the analyst, and transmits the setting as the setting information to the voice analysis device 100 (a).
  • the setting unit 111 of the voice analysis device 100 acquires setting information from the communication terminal 20 and causes the setting information storage unit 131 to store the setting information.
  • FIG. 4 is a front view of the display unit 21 of the communication terminal 20 displaying the setting screen A.
  • the communication terminal 20 displays the setting screen A on the display unit 21 and receives the setting of the analysis condition by the analyst.
  • the setting screen A includes a position setting area A1, a start button A2, and an end button A3.
  • the positioning area A1 is an area for setting the direction in which each participant U is actually positioned with reference to the sound collection device 10 in the argument to be analyzed.
  • the position setting area A1 represents a circle centered on the position of the sound collector 10 as shown in FIG. 4, and further represents an angle based on the sound collector 10 along the circle.
  • the analyst sets the position of each participant U in the position setting area A1 by operating the operation unit 22 of the communication terminal 20.
  • identification information here, U1 to U4
  • U1 to U4 identification information for identifying each participant U is allocated and displayed.
  • four participants U1 to U4 are set.
  • the portion corresponding to each participant U in the positioning area A1 is displayed in a different color for each participant. Thereby, the analyst can easily recognize the direction in which each participant U is set.
  • the start button A2 and the end button A3 are virtual buttons displayed on the display unit 21, respectively.
  • the communication terminal 20 transmits a signal of a start instruction to the voice analysis device 100 when the analyst presses the start button A2.
  • the communication terminal 20 transmits a signal of a termination instruction to the voice analysis device 100 when the analyst presses the termination button A3.
  • from the start instruction to the end instruction by the analyst is one discussion.
  • the voice acquisition unit 112 of the voice analysis device 100 When the voice acquisition unit 112 of the voice analysis device 100 receives the signal of the start instruction from the communication terminal 20, the voice acquisition unit 112 transmits a signal instructing acquisition of voice to the sound collection device 10 (b). When the sound collection device 10 receives a signal instructing acquisition of voice from the voice analysis device 100, the collection of voice is started. Further, when the voice acquisition unit 112 of the voice analysis device 100 receives the signal of the termination instruction from the communication terminal 20, the voice acquisition unit 112 transmits a signal instructing the termination of the voice acquisition to the sound collection device 10. When the sound collection device 10 receives a signal instructing the end of the acquisition of the sound from the speech analysis device 100, the sound collection device 10 ends the acquisition of the sound.
  • the sound collection device 10 acquires voices in each of a plurality of sound collection units, and internally records the sound as the sound of each channel corresponding to each sound collection unit. Then, the sound collection device 10 transmits the acquired voices of the plurality of channels to the voice analysis device 100 (c). The sound collector 10 may transmit the acquired voice sequentially or may transmit a predetermined amount or a predetermined time of sound. Further, the sound collection device 10 may collectively transmit the sound from the start to the end of the acquisition.
  • the voice acquisition unit 112 of the voice analysis device 100 receives voice from the sound collection device 10 and stores the voice in the voice storage unit 132.
  • the voice analysis device 100 analyzes voice at predetermined timing using the voice acquired from the sound collection device 10.
  • the voice analysis device 100 may analyze the voice when the analyst gives an analysis instruction at the communication terminal 20 by a predetermined operation. In this case, the analyst selects a voice corresponding to the argument to be analyzed from the voices stored in the voice storage unit 132.
  • the voice analysis device 100 may analyze the voice when the voice acquisition ends. In this case, the speech from the start to the end of the acquisition corresponds to the argument to be analyzed. In addition, the voice analysis device 100 may analyze voice sequentially (that is, in real time processing) during acquisition of voice. In this case, the voice analysis device 100 goes back from the current time, and the voice for a predetermined time (for example, 30 seconds) in the past corresponds to the argument to be analyzed.
  • a predetermined time for example, 30 seconds
  • the sound source localization unit 113 When analyzing speech, first, the sound source localization unit 113 performs sound source localization based on the plurality of channels of speech acquired by the speech acquisition unit 112 (d). The sound source localization is processing for estimating the direction of the sound source included in the sound acquired by the sound acquisition unit 112 for each time (for example, every 10 milliseconds to 100 milliseconds). The sound source localization unit 113 associates the direction of the sound source estimated for each time with the direction of the participant indicated by the setting information stored in the setting information storage unit 131.
  • the sound source localization unit 113 can identify the direction of the sound source based on the sound acquired from the sound collection device 10, a known sound source localization method such as Multiple Signal Classification (MUSIC) method or beam forming method can be used. .
  • MUSIC Multiple Signal Classification
  • the analysis unit 114 analyzes the voice based on the voice acquired by the voice acquisition unit 112 and the direction of the sound source estimated by the sound source localization unit 113 (e).
  • the analysis unit 114 may analyze the entire completed discussion as an analysis target, or may analyze a part of the discussion in the case of real-time processing.
  • the analysis unit 114 first performs analysis (for example, 10 milliseconds to 100 milliseconds) in the discussion of the analysis target. Every second), it is determined which participant speaks (speaks).
  • the analysis unit 114 specifies a continuous period from the start to the end of one participant's speech as a speech period, and causes the analysis result storage unit 133 to store the same. When a plurality of participants speak at the same time, the analysis unit 114 specifies a speech period for each participant.
  • the analysis unit 114 calculates the amount of speech of each participant for each time, and causes the analysis result storage unit 133 to store the amount. Specifically, in a certain time window (for example, 5 seconds), the analysis unit 114 divides the length of time during which the participant speaks by the length of the time window, the amount of speech per hour (activity Calculated as a degree). Then, the analysis unit 114 calculates the amount of speech per hour for each participant while shifting the time window by a predetermined time (for example, one second) from the start time of the discussion to the end time (currently in the case of real time processing). repeat.
  • a predetermined time for example, one second
  • the section setting unit 115 sets one or more sections for the voice corresponding to the argument to be analyzed based on the input from the user (the participant or the analyst).
  • the section may be set for each subject subject to a discussion such as "Japanese language”, “Science” or “Society”, for example, and a discussion such as "Discussion”, “Idea” or “Summary” It may be set for each stage of
  • the section setting unit 115 stores section information indicating a section in the analysis result storage section 133 in association with the voice to be set.
  • the section information includes the section name and the section time (ie, the start time and end time of the section in the voice).
  • the section setting unit 115 determines a section based on at least one of (1) an operation in the communication terminal 20, (2) an operation in the sound collector 10, and (3) a predetermined sound acquired by the sound collector 10.
  • the participant or the analyst When setting a section based on an operation on the communication terminal 20, the participant or the analyst includes the section information by operating the operation unit 22 (for example, a touch screen, a mouse, a keyboard, etc.) of the communication terminal 20. Input the character string and time. The participant or the analyst may input the section information after the end of the discussion, or may input the section information in the middle of the discussion. Then, the section setting unit 115 receives section information specified in the communication terminal 20 via the communication unit 120 and stores the information in the analysis result storage unit 133.
  • the operation unit 22 for example, a touch screen, a mouse, a keyboard, etc.
  • the participant or the analyst When setting a section based on an operation in the sound collection device 10, the participant or the analyst operates the operation unit such as a switch or a touch screen provided on the sound collection device 10 when switching the section. , Set the interval.
  • the operation of the operation unit of the sound collection device 10 is associated in advance with switching of a predetermined section (for example, switching from a "discussion" section to an "idea out” section).
  • the section setting unit 115 receives information indicating an operation from the operation unit of the sound collection device 10 via the communication unit 120, and specifies switching of a predetermined section at the timing of the operation. Then, the section setting unit 115 stores the specified section information in the analysis result storage unit 133.
  • the participant or the analyst uses the device capable of generating the sound (for example, a portable terminal, a music reproduction apparatus, etc.) to set the section.
  • a predetermined switching sound indicating switching is generated.
  • the switching sound may be a sound wave that can be heard by humans, or an ultrasonic wave that can not be heard by humans.
  • the switching sound indicates the switching of the section by, for example, a predefined frequency or an on / off pattern.
  • the switching sound may be emitted only at the switching timing of the section, or may be emitted continuously in the section.
  • the section setting unit 115 detects the switching sound included in the sound acquired by the sound collection device 10. Then, the section setting unit 115 specifies switching from the section corresponding to the switching sound before the change to the section corresponding to the switching sound after the change at the timing when the switching sound changes. Then, the section setting unit 115 stores the specified section information in the analysis result storage unit 133.
  • the section setting unit 115 detects the switching sound included in the sound acquired by the sound collection device 10. Then, the section setting unit 115 specifies switching of a predetermined section at the timing when the switching sound is emitted. Then, the section setting unit 115 stores the specified section information in the analysis result storage unit 133.
  • the output unit 116 performs control to display the analysis result by the analysis unit 114 on the display unit 21 by transmitting the display information to the communication terminal 20 (f).
  • the output unit 116 is not limited to the display on the display unit 21 and may output the analysis result by other methods such as printing by a printer, data recording to a storage device, and the like. The method of outputting the analysis result by the output unit 116 will be described below with reference to FIGS. 5 to 9.
  • the output unit 116 of the voice analysis device 100 reads out, from the analysis result storage unit 133, the analysis result by the analysis unit 114 and the section information by the section setting unit 115 for the display target discussion.
  • the output unit 116 may display a discussion immediately after the analysis by the analysis unit 114 is completed, or may display a discussion specified by the analyst.
  • FIG. 5 is a front view of the display unit 21 of the communication terminal 20 displaying the speech amount screen B.
  • the speech amount screen B is a screen for displaying information indicating time change of the speech amount for each section, and includes a graph B1 of the speech amount, the name of the section B2, and the switching line B3 of the section.
  • the output unit 116 is a display for displaying the time change of the speech amount of each participant for each section based on the analysis result and the section information read from the analysis result storage section 133. Generate information.
  • the graph B1 is a graph showing the time change of the amount of speech of each participant U.
  • the output unit 116 displays the amount of speech (activity) on the vertical axis and the time on the horizontal axis, and displays the amount of speech for each participant U at each time indicated by the analysis result on the display unit 21 as a line graph. At this time, the output unit 116 accumulates the amounts of speech of the participants U at each point in time, that is, displays the sum of the amounts of speech of the participants U in order on the vertical axis.
  • the amount of speech of participant U4 is the total value of the amounts of speech of participants U3 and U4
  • the amount of speech of participant U2 is the total value of the amounts of speech of participants U2, U3 and U4.
  • the amount of speech of the participant U1 is a total value of the amounts of speech of the participants U1, U2, U3, and U4.
  • the output unit 116 may randomly determine the order of accumulating (summing) the utterance amounts of the participants U, or may determine the order according to a predetermined rule.
  • the output unit 116 can display the amount of speech of the entire group of discussions in addition to the amount of speech of each participant U.
  • the analyst can grasp the time change of contribution of each participant U, and at the same time grasp the time change of excitement of the whole group of the participant U.
  • the output unit 116 displays an area or a line indicating the graph B1 for each participant U in a display mode such as a color, a pattern, or the like different for each participant.
  • a display mode such as a color, a pattern, or the like different for each participant.
  • the graph B1 is displayed in a different pattern for each participant U, and a legend that associates the participant U with the pattern is displayed in the vicinity of the graph B1. Thereby, the analyst can easily determine which participant U the graph B1 corresponds to.
  • the section name B2 is a character string representing the section name.
  • the section switching line B3 is a line indicating the switching timing of the two sections.
  • the output unit 116 displays, for each section indicated by the section information, the section name in the vicinity of the graph B1 of the time range corresponding to the section. Further, the output unit 116 specifies the switching timing of the two sections based on the time of the section indicated by the section information. Then, the output unit 116 causes the switching line B3 to be displayed at the time (horizontal axis) position of the graph B1 corresponding to the specified switching timing. Thereby, the output unit 116 can display which section the graph B1 of the amount of speech of each participant U corresponds to each time.
  • the output unit 116 superimposes on the time change of the utterance amount of each participant U, and displays the information indicating the section set in the discussion. Therefore, the analyst can grasp the time change of the amount of speech of each participant U for each section.
  • the output unit 116 can display the time change of the utterance amount of each participant U in a legible manner by determining the order of accumulating the utterance amount of the participant U in the graph B1 based on the utterance amount of each participant U it can.
  • the output unit 116 may switch between and display the speech amount screen B in FIG. 5 and the speech amount screen B in FIG. 6 according to the operation of the analyst, or may display at least one of the predetermined ones.
  • the output unit 116 measures the degree of variation (for example, variance or standard deviation) of the utterance amount of each participant U in each section based on the analysis result and the section information read from the analysis result storage section 133 Calculate). Then, the output unit 116 generates the graph B1 by accumulating the utterance amounts of the participants U in the order in which the degree of variation is small in each section. The output unit 116 may determine the stacking order based on the degree of variation of all sections, not for each section.
  • the degree of variation for example, variance or standard deviation
  • the change in the amount of utterance of the participant U disposed below is the apparent amount of utterance of the participant U disposed above It is possible to reduce the impact. Further, since the tendency of the amount of speech of each participant U changes depending on the section, by changing the stacking order for each section, it is possible to display the time change of the amount of speech more easily.
  • the output unit 116 may display a predetermined event that has occurred during the discussion (that is, within the time of the sound acquired by the sound acquisition unit 112) in the graph B1. Thereby, the analyst can analyze the influence of the occurrence of the event on the volume of each participant U's utterance.
  • the event is, for example, (1) access to a group of assistants (teachers, facilitators, etc.) of the discussion, or (2) specific remarks (words) of the assistants.
  • the event shown here is an example, and the output unit 116 may display the occurrence of other events that can be recognized by the voice analysis device 100.
  • the output unit 116 uses a signal transmitted and received between the sound collector 10 and the assistants.
  • the assistant holds a transmitter that emits a predetermined signal by radio waves or ultrasonic waves of wireless communication such as Bluetooth (registered trademark), for example, and the sound collection device 10 includes a receiver that receives the signal.
  • the output unit 116 indicates that the assistant has approached when the receiver of the sound collection device 10 can receive the signal from the transmitter of the assistant or when the strength of receiving the signal becomes equal to or higher than a predetermined threshold.
  • the output unit 116 is configured to leave the assistant when the receiver of the sound collection device 10 can not receive the signal from the transmitter of the assistant or when the intensity at which the signal is received becomes less than a predetermined threshold. Determine what you did.
  • the output unit 116 may use the voiceprint of the assistant (ie, the frequency spectrum of the assistant's voice) to detect the approach of the assistant's group.
  • the output unit 116 registers the voiceprint of the assistant in advance, and detects the voiceprint of the assistant in the voice acquired by the sound collection device 10 during the discussion. Then, the output unit 116 determines that the assistant has approached when detecting the assistant's voiceprint, and determines that the assistant has left when the assistant's voiceprint can not be detected.
  • the output unit 116 performs speech recognition on the speech of the assistant.
  • the assistant holds a sound collector (for example, a pin microphone), and the output unit 116 receives the voice of the assistant acquired by the sound collector held by the assistant.
  • a sound collector held by the assistant separately from the sound collector 10, the voice of the participant U and the voice of the assistant can be clearly distinguished.
  • the output unit 116 converts the voice acquired from the sound collection device held by the assistant into a character string.
  • the output unit 116 can use a known speech recognition method to convert speech into a character string. Then, the output unit 116 outputs specific words (for example, words related to the progress of the discussion such as “first”, “summary”, “last”, and words such as “good” or “bad”) in the converted character string. ) To detect.
  • the words to be detected are set in the voice analysis device 100 in advance. Then, when the specific word is detected, the output unit 116 determines that the specific word is uttered.
  • the output unit 116 may perform speech recognition only before and after the timing at which the change in the amount of speech of each participant U is large. In this case, based on the analysis result read out from the analysis result storage unit 133, the output unit 116 calculates the degree of change in the amount of speech per time (for example, the amount or ratio of change per unit time). The degree of change in the amount of speech may be calculated for each participant U, or may be calculated as the sum of all participants U.
  • the output unit 116 outputs the voice acquired by the sound collector held by the assistant in a predetermined time range (for example, 5 seconds after 5 seconds before the timing) including timing when the degree of change is equal to or higher than the predetermined threshold.
  • a predetermined time range for example, 5 seconds after 5 seconds before the timing
  • the output unit 116 outputs the voice acquired by the sound collector held by the assistant in a predetermined time range (for example, 5 seconds after 5 seconds before the timing) including timing when the degree of change is equal to or higher than the predetermined threshold.
  • FIG. 7 is a front view of the display unit 21 of the communication terminal 20 displaying the speech amount screen B.
  • event information B4 is displayed on the graph B1, and the other is the same as the speech amount screen B of FIG.
  • the output unit 116 may switch and display the speech amount screen B in FIG. 5 and the speech amount screen B in FIG. 7 according to the operation of the analyst, or may display at least one of the predetermined amounts.
  • the event information B4 is information indicating the content and timing of the event.
  • the event information B4 indicates the content of the event by, for example, a character string indicating that the assistant has approached or left or a character string indicating the speech of the assistant detected by speech recognition. Further, the event information B4 indicates the timing of the event by an arrow indicating the timing at which the event occurs on the graph B1.
  • the output unit 116 displays information indicating the content and timing of an event that has occurred in the discussion, superimposed on the time change of the utterance amount of each participant U. Therefore, the analyst can analyze how the event that occurred during the discussion influenced the time change of the volume of each participant U's utterance.
  • the analyst can evaluate that the teacher has activated the discussion, for example, when the amount of speech increases when the teacher approaches the group.
  • the analyst can also evaluate that the word is a valid word for activating the discussion, for example, when the amount of speech increases when a specific word is issued by the teacher.
  • the output unit 116 can extract and display a graph of a plurality of utterance amounts in the same section.
  • FIG. 8 is a front view of the display unit 21 of the communication terminal 20 displaying the section extraction screen C.
  • the output unit 116 displays the section extraction screen C for the specified section when, for example, the analyst specifies the name B2 of any section on the speech amount screen B in FIGS. 5 to 7.
  • the section extraction screen C is a screen for displaying a result of extracting a graph of the amount of speech of the same section, and includes a graph C1 of the amount of speech, a name C2 of the section, and a name C3 of the group.
  • the output unit 116 When displaying the section extraction screen C, the output unit 116 extracts analysis results and section information of a plurality of groups for the designated section from the analysis result storage section 133.
  • the groups to be displayed may be different groups discussed at the same time, or the same or different groups discussed in the past. Then, the output unit 116 generates display information for displaying the time change of the utterance amount of each participant for a plurality of groups in the designated section based on the extracted analysis result and the section information.
  • the graph C1 of the amount of speech is a graph showing the time change of the amount of speech of each participant U in the designated section for each of two or more groups.
  • the display mode of the graph C1 is the same as that of the graph B1.
  • the section name C2 is a character string indicating the name of the designated section.
  • the group name C3 is a name for identifying a group to be displayed, and may be set by the analyst or may be automatically determined by the voice analysis device 100.
  • the output unit 116 displays the graph C1 of two groups, but the graph C1 of three or more groups may be displayed. Also, the output unit 116 may display the names of one or more participants U belonging to the group instead of or in addition to the name C3 of the group.
  • the output unit 116 displays a plurality of graphs indicating temporal change in the amount of speech of each participant in different groups in the same section.
  • This allows the analyst to compare and analyze temporal changes in the volume of speech of different groups for the same section (e.g., the same subject, or the same stage in the discussion). For example, an analyst can grasp the tendency of the volume of utterance for each group by comparing different groups discussed at the same time. Also, for example, the analyst can grasp the change in the tendency of the utterance amount of the same group by comparing a plurality of past discussions of the same section for the same group.
  • the output unit 116 is not limited to the stacked graph as illustrated in FIG. 5, and may display a heat map indicating time change of the amount of speech of each participant U.
  • FIG. 9 is a front view of the display unit 21 of the communication terminal 20 displaying the speech amount screen D.
  • the speech amount screen D includes a heat map D1 of the speech amount, a section name D2, and a section switching line D3.
  • the section name D2 and the section switching line D3 are the same as the section name B2 and the section switching line B3 in FIG.
  • the speech amount heat map D1 displays the amount of speech along time by color.
  • FIG. 9 shows the color difference by the density of the points, for example, the higher the density of the points, the darker the color, and the lower the density of the points, the lighter the color.
  • the output unit 116 takes time in a predetermined direction (for example, the horizontal direction in FIG. 9) and causes the display unit 21 to display, for each participant U, an area of a color according to the amount of speech per hour.
  • the analyzer can also grasp the time change of the amount of speech of each participant U for each section by displaying the heat map instead of the graph.
  • the output unit 116 may switch and display the graph of FIG. 5 and the heat map of FIG. 9 according to the operation of the analyst, or may display at least one of the predetermined ones.
  • FIG. 10 is a sequence diagram of the speech analysis method performed by the speech analysis system S according to the present embodiment.
  • the communication terminal 20 receives the setting of analysis conditions from the analyst, and transmits the setting as setting information to the voice analysis device 100 (S11).
  • the setting unit 111 of the voice analysis device 100 acquires setting information from the communication terminal 20 and causes the setting information storage unit 131 to store the setting information.
  • the voice acquisition unit 112 of the voice analysis device 100 transmits a signal instructing voice acquisition to the sound collection device 10 (S12).
  • the sound collection device 10 receives a signal instructing acquisition of voice from the voice analysis device 100
  • recording of voice is started using a plurality of sound collection units, and the voice analysis device 100 collects voices of a plurality of channels recorded.
  • the voice acquisition unit 112 of the voice analysis device 100 receives voice from the sound collection device 10 and stores the voice in the voice storage unit 132.
  • the voice analysis device 100 starts voice analysis at one of timings when an analyst gives instructions, when voice acquisition ends, or during voice acquisition (that is, real-time processing).
  • the sound source localization unit 113 performs sound source localization based on the speech acquired by the speech acquisition unit 112 (S14).
  • the analysis unit 114 determines, based on the voice acquired by the voice acquisition unit 112 and the direction of the sound source estimated by the sound source localization unit 113, which participant has spoken at each time.
  • the speech period and the speech volume are specified in (S15).
  • the analysis unit 114 causes the analysis result storage unit 133 to store the utterance period and the utterance amount for each participant.
  • the section setting unit 115 sets one or more sections for the voice corresponding to the argument to be analyzed (S16). At this time, the section setting unit 115 sets a section based on at least one of the operation in the communication terminal 20, the operation in the sound collection device 10, and the predetermined sound acquired by the sound collection device 10.
  • the section setting unit 115 stores section information indicating a section in the analysis result storage section 133 in association with the voice to be set.
  • the output unit 116 performs control to display the analysis result on the display unit 21 of the communication terminal 20 (S17). Specifically, based on the analysis result by the analysis unit 114 and the section information by the section setting unit 115, the output unit 116 displays the above-mentioned utterance amount screen B, the section extraction screen C, or the utterance amount screen D. Information is generated and transmitted to the communication terminal 20.
  • the communication terminal 20 causes the display unit 21 to display the analysis result in accordance with the display information received from the voice analysis device 100 (S18).
  • the voice analysis device 100 displays the time change of the amount of speech of each participant for each section. Thereby, the analyst can grasp the time change of the amount of speech of each participant for each section.
  • the voice analysis device 100 automatically analyzes the discussions of the plurality of participants based on the voice acquired using the sound collection device 10 having the plurality of sound collection units. Therefore, it is not necessary to have the recorder monitor the discussion as in the Harkness method described in Non-Patent Document 1, and it is not necessary to arrange the recorder for each group, so the cost is low.
  • the processor of the speech analysis device 100, the sound collection device 10, and the communication terminal 20 is a main body of each step (process) included in the speech analysis method shown in FIG. That is, the processors of the voice analysis device 100, the sound collection device 10 and the communication terminal 20 read a program for executing the voice analysis method shown in FIG. 10 from the storage unit, and execute the program to execute the voice analysis device 100. By controlling each part of the sound device 10 and the communication terminal 20, the voice analysis method shown in FIG. 10 is performed.
  • the steps included in the speech analysis method shown in FIG. 10 may be partially omitted, the order between the steps may be changed, and a plurality of steps may be performed in parallel.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

L'objectif de la présente invention est de fournir un dispositif d'analyse vocale, un procédé d'analyse vocale, un programme d'analyse vocale et un système d'analyse vocale, qui permettent de fournir en sortie des informations pour l'analyse en fonction d'une variation temporelle de la quantité de paroles énoncées par un participant pendant une discussion. Un dispositif d'analyse vocale (100) selon un mode de réalisation de la présente invention comprend : une unité d'acquisition de voix (112) qui acquiert des voix prononcées par une pluralité de participants ; une unité d'analyse (114) qui spécifie les quantités respectives de paroles, parmi les voix, énoncées par la pluralité de participants pour chaque période de temps ; une unité de configuration de section (115) qui définit des sections dans les voix en fonction d'une entrée d'un utilisateur ; et une unité de sortie (116) qui fournit en sortie un graphe dans lequel des modifications temporelles des quantités de paroles énoncées par les participants respectifs sont accumulées, et des informations qui indiquent les sections dans le graphe.
PCT/JP2018/000942 2018-01-16 2018-01-16 Dispositif d'analyse vocale, procédé d'analyse vocale, programme d'analyse vocale et système d'analyse vocale Ceased WO2019142231A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2018/000942 WO2019142231A1 (fr) 2018-01-16 2018-01-16 Dispositif d'analyse vocale, procédé d'analyse vocale, programme d'analyse vocale et système d'analyse vocale
JP2018502279A JP6589040B1 (ja) 2018-01-16 2018-01-16 音声分析装置、音声分析方法、音声分析プログラム及び音声分析システム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2018/000942 WO2019142231A1 (fr) 2018-01-16 2018-01-16 Dispositif d'analyse vocale, procédé d'analyse vocale, programme d'analyse vocale et système d'analyse vocale

Publications (1)

Publication Number Publication Date
WO2019142231A1 true WO2019142231A1 (fr) 2019-07-25

Family

ID=67300990

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/000942 Ceased WO2019142231A1 (fr) 2018-01-16 2018-01-16 Dispositif d'analyse vocale, procédé d'analyse vocale, programme d'analyse vocale et système d'analyse vocale

Country Status (2)

Country Link
JP (1) JP6589040B1 (fr)
WO (1) WO2019142231A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPWO2021245759A1 (fr) * 2020-06-01 2021-12-09
JPWO2023079602A1 (fr) * 2021-11-02 2023-05-11

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008139654A (ja) * 2006-12-04 2008-06-19 Nec Corp 対話状況区切り推定方法、対話状況推定方法、対話状況推定システムおよび対話状況推定プログラム
JP2015028625A (ja) * 2013-06-28 2015-02-12 キヤノンマーケティングジャパン株式会社 情報処理装置、情報処理装置の制御方法、およびプログラム
JP2016206355A (ja) * 2015-04-20 2016-12-08 本田技研工業株式会社 会話解析装置、会話解析方法及びプログラム
JP2017033443A (ja) * 2015-08-05 2017-02-09 日本電気株式会社 データ処理装置、データ処理方法、及び、プログラム
JP2017161731A (ja) * 2016-03-09 2017-09-14 本田技研工業株式会社 会話解析装置、会話解析方法およびプログラム

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008139654A (ja) * 2006-12-04 2008-06-19 Nec Corp 対話状況区切り推定方法、対話状況推定方法、対話状況推定システムおよび対話状況推定プログラム
JP2015028625A (ja) * 2013-06-28 2015-02-12 キヤノンマーケティングジャパン株式会社 情報処理装置、情報処理装置の制御方法、およびプログラム
JP2016206355A (ja) * 2015-04-20 2016-12-08 本田技研工業株式会社 会話解析装置、会話解析方法及びプログラム
JP2017033443A (ja) * 2015-08-05 2017-02-09 日本電気株式会社 データ処理装置、データ処理方法、及び、プログラム
JP2017161731A (ja) * 2016-03-09 2017-09-14 本田技研工業株式会社 会話解析装置、会話解析方法およびプログラム

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HUMAN INTERFACE 2015, 1 September 2015 (2015-09-01), pages 939 - 943 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPWO2021245759A1 (fr) * 2020-06-01 2021-12-09
WO2021245759A1 (fr) * 2020-06-01 2021-12-09 ハイラブル株式会社 Dispositif de conférence vocale, système de conférence vocale et procédé de conférence vocale
JP7530070B2 (ja) 2020-06-01 2024-08-07 ハイラブル株式会社 音声会議装置、音声会議システム及び音声会議方法
JP2024147690A (ja) * 2020-06-01 2024-10-16 ハイラブル株式会社 音声会議装置、音声会議システム及び音声会議方法
US12260876B2 (en) 2020-06-01 2025-03-25 Hylable Inc. Voice conference apparatus, voice conference system and voice conference method
JP7766887B2 (ja) 2020-06-01 2025-11-11 ハイラブル株式会社 音声会議装置、音声会議システム及び音声会議方法
JPWO2023079602A1 (fr) * 2021-11-02 2023-05-11
JP7768591B2 (ja) 2021-11-02 2025-11-12 ハイラブル株式会社 音声分析装置及び音声分析方法

Also Published As

Publication number Publication date
JPWO2019142231A1 (ja) 2020-01-23
JP6589040B1 (ja) 2019-10-09

Similar Documents

Publication Publication Date Title
US12118978B2 (en) Systems and methods for generating synthesized speech responses to voice inputs indicative of a user in a hurry
WO2007139040A1 (fr) dispositif de crÉation de donnÉes de situation de discours, dispositif de visualisation de situation de discours, dispositif d'Édition de donnÉes de situation de discours, dispositif de reproduction de donnÉes de discours, et systÈme de communication de discours
US12106766B2 (en) Systems and methods for pre-filtering audio content based on prominence of frequency content
CN113223487B (zh) 一种信息识别方法及装置、电子设备和存储介质
Ramsay et al. The intrinsic memorability of everyday sounds
JP7427274B2 (ja) 音声分析装置、音声分析方法、音声分析プログラム及び音声分析システム
JP6589040B1 (ja) 音声分析装置、音声分析方法、音声分析プログラム及び音声分析システム
JP6646134B2 (ja) 音声分析装置、音声分析方法、音声分析プログラム及び音声分析システム
KR102077642B1 (ko) 시창평가 시스템 및 그것을 이용한 시창평가방법
JP6589042B1 (ja) 音声分析装置、音声分析方法、音声分析プログラム及び音声分析システム
KR101243766B1 (ko) 음성 신호를 이용하여 사용자의 성격을 판단하는 시스템 및 방법
JP6589041B1 (ja) 音声分析装置、音声分析方法、音声分析プログラム及び音声分析システム
JP7768591B2 (ja) 音声分析装置及び音声分析方法
KR102702335B1 (ko) 온라인 음악 활동을 통한 저시력 장애인 심리분석 서버 및 이를 이용한 심리분석 방법
JP2020173415A (ja) 教材提示システム及び教材提示方法
JP6975755B2 (ja) 音声分析装置、音声分析方法、音声分析プログラム及び音声分析システム
JP7414319B2 (ja) 音声分析装置、音声分析方法、音声分析プログラム及び音声分析システム
JP6975756B2 (ja) 音声分析装置、音声分析方法、音声分析プログラム及び音声分析システム
JP2026010176A (ja) 音声分析装置及び音声分析方法
Altaf et al. Perceptually motivated temporal modeling of footsteps in a cross-environmental detection task
KR20240082748A (ko) 음성 녹취에 기초한 심리 분석 방법 및 시스템
KR20200018859A (ko) 스피치 피드백을 위한 웹 서비스 시스템
HK20016739A1 (en) Computerized systems and methods for determining authenticity using micro expressions
KR20160057098A (ko) 회의록 작성 기능을 갖는 인터랙티브 보드 및 이의 운용방법

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018502279

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18900614

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22.10.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18900614

Country of ref document: EP

Kind code of ref document: A1