HK1034796B - Methods for detecting emotions - Google Patents
Methods for detecting emotions Download PDFInfo
- Publication number
- HK1034796B HK1034796B HK01105356.6A HK01105356A HK1034796B HK 1034796 B HK1034796 B HK 1034796B HK 01105356 A HK01105356 A HK 01105356A HK 1034796 B HK1034796 B HK 1034796B
- Authority
- HK
- Hong Kong
- Prior art keywords
- information
- individual
- spt
- emotional state
- cor
- Prior art date
Links
Description
Technical Field
The invention relates to a method for monitoring an emotional state.
Background
Published PCT application WO97/01984(PCT/IL96/00027) describes a method for effecting biofeedback modulation of at least one physiologically variable characteristic of an emotional state of a subject, comprising the steps of: monitoring at least one speech parameter characteristic of the emotional state of the subject to produce an indicator signal, and using the indicator signal to provide an indication of the at least one physiological variable to the subject. A system may allow the method to be performed in a standalone mode or over a telephone line, in which case the indication signal may be obtained at a location remote from the subject. Information relating to the emotional state of the subject may be transmitted audibly to a remote location or textually via the internet and then processed as needed.
A method and apparatus for determining emotional stress in a sequence of utterances is described in published european patent application No.94850185.3 (publication No. 306664537 a 2). A model of the speech is generated based on a sequence recognized in the speech. The difference between the spoken sequence and the modeled speech is obtained by comparing them.
Us patent 1,384,721 describes a method and apparatus for physiological response analysis.
U.S. patent 3,855,416 to Fuller describes a method and apparatus for performing voicing analysis resulting in valid truth/lie decisions by basic speech energy weighted tremolo component estimation.
U.S. patent 3,855,417 to Fuller describes a method and apparatus for performing utterance analysis resulting in valid truth/lie determination through spectral energy region comparison.
U.S. patent 3,855,418 to Fuller describes a method and apparatus for performing voicing analysis by vibrato component estimation resulting in an effective truth/lie determination.
The contents of all publications mentioned in the specification and the publications cited therein are incorporated herein by reference.
Disclosure of Invention
It is an object of the present invention to provide an improved apparatus and method for monitoring an emotional state.
There is thus provided in accordance with a preferred embodiment of the present invention an apparatus for detecting an emotional state of an individual, the apparatus including: a voice analyzer operable to input speech samples generated by an individual and to derive therefrom intonation information; and an emotional reporter operable to generate an output indication of an emotional state of the individual based on the intonation information.
According to another preferred embodiment of the invention, the speech samples are provided to the voice analyzer over the telephone.
According to another preferred embodiment of the invention the report on the emotional state of the person comprises a lie detection report based on the emotional state of the person.
According to another preferred embodiment of the present invention, the intonation information includes multi-dimensional intonation information.
According to another preferred embodiment of the invention the multi-dimensional information comprises at least 3-dimensional information.
According to another preferred embodiment of the invention the multi-dimensional information comprises at least 4-dimensional information.
According to another preferred embodiment of the invention, the intonation information comprises information about spikes.
According to another preferred embodiment of the present invention, the information on the spike includes a number of spikes within a predetermined time period.
According to another preferred embodiment of the invention, the information about the spikes comprises a distribution of the spikes over time.
According to another preferred embodiment of the present invention, the intonation information includes information about the platform.
According to another preferred embodiment of the present invention, the information on the platforms includes the number of platforms in a predetermined time period.
According to another preferred embodiment of the present invention, the information on the platform includes information on a length of the platform.
According to another preferred embodiment of the present invention, the information on the stage length includes an average stage length for a predetermined time period.
According to another preferred embodiment of the present invention, the information on the stage length includes a standard error of the stage length for a predetermined time period.
There is also provided in accordance with another preferred embodiment of the present invention a lie detection system including: a multi-dimensional voice analyzer operable to input a speech sample generated by an individual and to quantify a plurality of characteristics of the speech sample; and a credibility assessment reporter operable to generate an output indication of the credibility of the individual based on the plurality of quantified characteristics, including detection of a lie.
There is also provided in accordance with another preferred embodiment of the present invention a detection method including: the method includes receiving a voice sample generated by an individual, quantizing a plurality of features of the voice sample, and generating an output indication of trustworthiness of the individual based on the plurality of quantized features, including detection of a lie.
According to another preferred embodiment of the invention, the speech sample comprises a main speech waveform having a period, and the voice analyzer is operable to analyze the speech sample to determine the occurrence of plateaus, each plateau indicating that a local, relatively low frequency waveform is superimposed on the main speech waveform, and the emotion reporter is operable to provide an appropriate output indication in dependence on the occurrence of plateaus. For example, the emotion reporter may provide an appropriate output indication when a change in the incidence of the platform is found.
Similarly, each spike represents a local relatively high frequency waveform superimposed on the main speech waveform. One particular advantage of the analysis of plateaus and spikes, as shown and described herein, is that substantially all frequencies of the speech waveform can be analyzed.
There is also provided in accordance with another preferred embodiment of the present invention a method for detecting an emotional state, including: the method comprises establishing a multidimensional feature range representing a range of emotions of the individual at rest by monitoring a plurality of emotion related parameters of the individual during a first period of time in an emotionally neutral state, defining the multidimensional feature range as a function of the range of the plurality of emotion related parameters during the first period of time, monitoring the plurality of emotion related parameters of the individual during a second period of time in which it is desired to detect the emotional state of the individual, thereby obtaining a measure of the plurality of emotion related parameters, and adjusting the measure to take into account the range.
There is also provided in accordance with another preferred embodiment of the present invention a method for detecting an emotional state of an individual, the method including: a speech sample generated by an individual is received, intonation information is derived therefrom, and an output indication of an emotional state of the individual is generated based on the intonation information.
Drawings
The present invention will be further understood from the following detailed description taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1A is a schematic diagram of a system for online monitoring of an emotional state of a speaker;
FIG. 1B is a simplified flow diagram of a preferred method for online monitoring of a speaker's emotional state;
FIG. 2 is a graph of a voice segment including a plurality of spikes;
FIG. 3 is a graph of a voice segment including multiple platforms;
FIG. 4 is a simplified flow diagram of a preferred method for performing step 40 of FIG. 1B;
FIG. 5 is a simplified flowchart of a preferred method for implementing the true/neutral mood profile establishing step of FIG. 1B;
FIG. 6 is a simplified flow diagram of a preferred method for performing step 90 of FIG. 1B on a particular segment;
FIG. 7 is a simplified flow diagram of a preferred method for performing step 100 of FIG. 1B;
FIG. 8 is a simplified flow diagram of a preferred method for performing step 105 of FIG. 1B;
FIG. 9 is a diagram illustrating a screen display of a form (form) prior to launching the application of appendix A in design mode;
FIG. 10 is a schematic diagram depicting a screen display of a form during calibration of a particular object in an operational mode of the system of appendix A;
FIG. 11 is a schematic diagram depicting a screen display of a form during testing of an object in the run mode of the system of appendix A; and
FIG. 12 is a simplified block diagram of a preferred system for performing the method of FIG. 1B.
Detailed Description
The following appendix is provided to aid in the understanding of one preferred embodiment of the invention shown and described herein:
appendix A is a computer-printed program listing of a preferred software implementation of the preferred embodiment of the invention shown and described.
A portion of the content of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent specification, as it appears in the patent and trademark office patent file or records, but otherwise reserves all copyright rights whatsoever.
FIG. 1A is a schematic diagram of a system for online monitoring of an emotional state of a speaker. As shown, the system in this embodiment receives voice input arriving over a telephone line. The system analyzes the speech input to derive an indication of the emotional state of the speaker, which is preferably provided to the user in real time (e.g., on a display screen as shown in the figure).
FIG. 1B is a simplified flow diagram of a preferred method for online monitoring of a speaker's emotional state. The method of FIG. 1B preferably includes the steps of:
an initialization step 10: constants such as thresholds for various parameters are defined, defining ranges that are considered to indicate various emotions, as will be described in detail later.
Step 20: a voice is recorded periodically or upon command. For example, a 0.5 second segment of speech is recorded continuously, i.e. 0.5 seconds each. Alternatively, overlapping or non-overlapping segments of any suitable length may be considered. For example, adjacent segments may overlap almost entirely, except for one or a few samples.
The voice recording is digitized.
Additionally or alternatively, overlapping segments of the record may be sampled.
Step 30: a speech segment is analyzed to mark key portions of the speech segment, i.e., portions of the speech segment that are deemed to actually contain speech information as opposed to background noise. One suitable criterion for voice information detection is amplitude, e.g. the point in amplitude that first exceeds a threshold is considered the beginning of the voice information, while if no sound exceeding the threshold is found within a predetermined period, the point before this time is considered the end of the voice information.
Preferably, the samples in the critical section are normalized, e.g., the samples are scaled up to utilize the full range of magnitudes that the memory can accommodate, e.g., +/-127 magnitude units if an 8-bit memory is used.
Step 40: peaks and plateaus in the critical section are counted. The length of each identified plateau is calculated, and the average plateau length for that critical section and the standard error for the plateau length are calculated.
A "spike" is a notch-shaped feature. For example, the term "spike" may be defined as:
a.3 sequences of adjacent samples, where the first and third samples are higher than the middle sample, or
b.3 sequences of adjacent samples, wherein the first and third samples are lower than the middle sample.
Preferably, the first and third samples are determined to be a "spike", i.e., preferably without a minimum threshold of inter-sample differences, even if they differ only slightly from the intermediate sample. However, a minimum threshold for a spiked baseline is preferred, i.e., spikes occurring at very low amplitudes are not considered because they are considered to be related to background noise rather than speech.
Fig. 2 is a graph of a speech segment 32, including a plurality of spikes 34.
"plateau" refers to local flatness in the voice waveform. For example, a plateau may be defined as a flat sequence whose length is greater than a predetermined minimum threshold and less than a predetermined maximum threshold. A maximum threshold is required for distinguishing the period of local flatness from silence. A sequence may be considered flat if the difference in amplitude between consecutive samples is less than a predetermined threshold (e.g., if an 8-bit memory is used that is less than 5 amplitude cells).
Fig. 3 is a graph of a speech segment 36, including a plurality of platforms 38. In appendix A, the platform is referred to as a "migration".
The system of the present invention generally operates in one of two modes:
a. calibration-when a subject is not lying and/or in a neutral emotional state, a profile of the subject's true/neutral emotional state is established by monitoring the subject.
b. Test-the subject's speech is compared to the profile of the subject's true/neutral emotional state established during calibration in order to establish the emotional state and/or whether the subject is true.
If the system is to be used in a calibration mode, the method proceeds from step 50 to step 60. If the system is to be used in a test mode, the method proceeds from step 50 to step 80.
Step 60: if step 60 is reached, this indicates that the current segment has been processed for calibration purposes. Thus, the spike and platform information derived in step 40 is stored in a calibration table.
The processing of steps 20-50 is referred to herein as "voice recording input processing". If there are more voice recordings to be entered for calibration purposes, the method returns to step 20. If all voice recordings for calibration purposes have been entered (step 70), the method proceeds to step 80.
Step 80: a profile of the true/neutral emotional state of the currently tested subject is established. This completes the operation in the calibration mode. Subsequently, the system enters a test mode in which the subject's voice recording is compared to his true/neutral emotion profile in order to identify the occurrence of a lie or strong emotion. The subject profile typically reflects the central tendency of the spike/platform information and is typically adjusted to account for artifacts in the calibration case. For example, an initial voice recording may be less reliable than a subsequent voice recording due to natural strain at the beginning of the calibration process. Preferably, extreme inputs in the calibration table may be discarded in order to obtain a reliable indication of concentration trends.
The portion preceding step 90 belongs to the test mode.
Step 90: the spike/plateau information of the current segment is compared to the true/neutral mood profile calculated in step 80.
Step 100: a threshold decision is made on the results of the comparison process of step 90 to classify the current segment as being indicative of various emotions and/or lies.
Step 105: optionally, the "delay" is compensated. The term "delay" refers to a residual emotional state that is delayed from an "actual" emotional state that occurred in the first observed state, wherein the residual emotional state is further delayed for a period of time after the first observed state has ceased. An example of a suitable implementation of step 105 is described below with reference to the flowchart of FIG. 8.
Step 110: a message indicating the category determined in step 100 is displayed.
Step 120: if there are other speech segments to analyze, then return to step 20. Otherwise, ending. Any suitable number m of segments may be used for calibration, for example 5 segments.
FIG. 4 is a simplified flow diagram of a preferred method for performing step 40 of FIG. 1B. As described above, at step 40, spike/plateau information is generated for a critical portion of the current voice recording segment.
The current length of the platform is called "jj".
The number of platforms of exactly jj in length is "jjmap (jj)".
"Plat" is a counter that counts the number of platforms regardless of their length.
"Thorn" is a counter that counts the number of spikes.
n is the number of samples in the critical section being tested.
At step 150, the spike and plateau counters are reset.
At step 160, a loop is started that samples all critical sections. The loop starts at the first key sample and ends at the last key sample minus 2.
At step 164, the amplitude of the samples in the loop is recorded.
At steps 170 and 180, spikes are detected, and at steps 190, 195, 200 and 210, the platform is detected.
At step 200, if the length of the candidate platforms is between reasonable bounds, e.g., between 3 and 20, the number of platforms of length jj is incremented and the total number of platforms Plat is incremented. Otherwise, i.e., if the length of the candidate platform is less than 3 or greater than 20, the candidate platform is not considered to be a platform.
Regardless of whether the candidate platform is considered to be a "true" platform, the platform length jj is zeroed out (step 210).
Step 220 is the end of the loop, i.e., the point at which all samples in the sequence have been examined.
At step 230, the average length (AVJ) and standard error (JQ) of the platform length variable jjmap are calculated.
At step 240, SPT and SPJ are calculated. The SPT is the average number of spikes per sample that are preferably properly normalized. SPJ is the average number of platforms per sample that is preferably properly normalized.
According to the illustrated embodiment, the emotional state detection is multi-dimensional, i.e. the emotional state is derived from the speech information by a plurality of intermediate variables, preferably independent of each other.
Fig. 5 is a simplified flowchart of a preferred method for implementing the true/neutral mood profile establishing step of fig. 1B.
In fig. 5, SPT (i) is the SPT value of segment i.
MinSPT is the minimum SPT value measured in any one of the m segments.
MaxSPT is the maximum SPT value measured in any of the m segments.
MinSPJ is the minimum SPJ value measured in any one of the m segments.
MaxSPJ is the maximum SPJ value measured in any of the m segments.
MinJQ is the minimum JQ value measured in any one of the m segments.
MaxJQ is the maximum JQ value measured in any one of the m segments.
ResSPT is the size of the range of SPT values encountered during calibration. More generally, ResSPT may include any suitable indication of the degree of change in the number of spikes expected while the subject is in a true/neutral emotional state. Thus, if the number of spikes in a speech segment is irregular relative to ResSPT, the subject may be said to be in a non-neutral emotional state, such as an emotional state that appears excited or even alert. ResSPT is therefore typically an input to the evaluation process of SPT values generated during unknown emotional environments.
ResSPJ is the size of the range of SPJ values encountered during calibration. More generally, ResSPJ may include any suitable indication of the degree of change in the number of platforms desired when a subject is in a true/neutral emotional state. Thus, if the number of platforms in a speech segment is irregular relative to ResSPJ, the subject may be said to be in a non-neutral emotional state, such as an emotional state that manifests as an internal paradox or a perception of cognitive incompatibility. ResSPJ is therefore typically an input to the evaluation process of SPJ values generated during unknown emotional environments.
ResJQ is the size of the range of JQ values encountered during calibration, as a baseline value for the evaluation of JQ values generated during unknown emotional environments.
It should be understood that the baseline need not necessarily be a 4-dimensional baseline as shown in FIG. 5, but may even be 1-dimensional or greater than 4-dimensional.
FIG. 6 is a simplified flow diagram of a preferred method for performing step 90 of FIG. 1B on a particular segment. As described above, in step 90, the spike/plateau information of the current segment is compared to the true/neutral emotional baseline calculated in step 80.
Step 400 is an initialization step.
Step 410 calculates the deviation of the current key part from the previously calculated real/neutral emotional state profile of the subject. In the illustrated embodiment, the deviation includes a 4-dimensional value that includes a first component related to the number of spikes, a second component related to the number of plateaus, a third component related to the standard error in the length of the plateaus, and a fourth component related to the average plateau length. However, it should be understood that different components may be employed in different applications. For example, in some applications, the distribution of spikes over a time interval (uniform, unstable, etc.) may be used to derive information about the emotional state of the subject.
“BreakpointT"is a threshold that represents an acceptable range of the ratio between the average number of spikes in the real/neutral emotional environment and the specific number of spikes in the current key portion.
“BreakpointJ"is a threshold value representing an acceptable range of ratios between the average number of platforms in a real/neutral emotional environment and the specific number of platforms in the current critical section.
“BreakpointQ"is a threshold value representing an acceptable range of ratios between the average standard error of the number of platforms in the real/neutral emotional environment and the specific standard error of the number of platforms in the current critical section.
“BreakpointA"is a threshold value representing an acceptable range of ratios between the average platform length in a real/neutral emotional environment and a particular average platform length in a current critical section.
Step 420-. In the illustrated embodiment, only the ResSPT and ResSPJ values are updated, and the update condition is only if the current key portion deviates very much (e.g., by more than a predetermined maximum) or very little (e.g., by less than some predetermined minimum, which is typically negative) from the subject's previously calculated true/neutral emotional state profile. If the deviation of the current key from the true/neutral mood profile is neither very large nor very small (e.g. falls between the highest and lowest values), the profile of the modified subject is typically not considered at this stage.
In steps 460 and 470, if zzSPT and zzSPJ, respectively, are very close to zero, the sensitivity of the system is increased by decrementing ResSPT and ResSPJ, respectively.
Step 480 generates an appropriate, usually proprietary, combination of the plurality of bias components calculated in step 410. These combinations are used as the basis for appropriate mood classification criteria, such as the one specified in fig. 7. The emotional categorization criteria of fig. 7 determine whether to categorize the subject as exaggerated, unreal, evasive, confusing or distressing, excited, or ironic. However, it should be understood that different mood classifications may be employed in different situations.
In the illustrated embodiment, the SPT information is used primarily to determine the excitement level. More specifically, zzSPT is used to determine the value of crEXCITE, which may also depend on additional parameters such as crSTRESS. For example, crEXCITE values between 70 and 120 may be considered normal, while values between 120 and 160 may be considered indicative of moderate excitation, while values above 160 may be considered indicative of high excitation.
In the illustrated embodiment, the SPJ information is primarily used to determine a feeling of mental incompatibility. For example, zzSPJ values between 0.6 and 1.2 may be considered normal, while values between 1.2 and 1.7 may be considered confusing or untrustworthy. It is believed that a value exceeding 1.7 indicates a perception of speech by the subject and/or an attempt by the subject to control his speech.
In the illustrated embodiment, the zzJQ and crSTRESS values are used primarily to determine the pressure level. For example, a crSTRESS value between 70 and 120 may be considered normal, while a value exceeding 120 is considered to represent a high pressure.
In the illustrated embodiment, the AVJ information is used to determine considerations put into spoken words or sentences. For example, if crTHINK exceeds the value 100, the thought put into said previous sentence is higher than the thought put into the calibration phase. This means that the speaking person has put more thinking about what he is now saying than in the calibration phase. If the value is less than 100, the speaking person puts less consideration on what he is now speaking than in the calibration phase.
In the illustrated embodiment, the crLIE parameter is used to determine authenticity. A crie value less than 50 may be considered unrealistic, a value between 50 and 60 may be considered ironic or humorous, a value between 60 and 130 may be considered authentic, a value between 130 and 170 may be considered inaccurate or exaggerated, and a value above 170 may be considered unreliable.
Referring to fig. 6, the above parameters may receive the following values:
BreakpointT=BreakpointJ=BreakpointQ=BreakpointA=1.1
CeilingT=CeilingJ=1.1
FloorJ=FloorT=-0.6
IncrementT=IncrementJ=DecrementT=DecrementJ=0.1
MinimalT=MinimalJ=0.1
it should be understood that all of these numerical values are examples only and generally depend on the application.
Fig. 7 shows a method for converting various parameters into a message that can be displayed, as shown in the example of fig. 1.
Fig. 8 shows a method for fine tuning of the true/neutral emotional state.
Appendix A is a computer-printed program listing of a software implementation of the preferred embodiment of the invention shown and described, which differs somewhat from the embodiment shown and described herein with reference to the drawings.
One suitable method for generating this software implementation is as follows:
a. in a microphone, sound card and Visual BasicTMOn a PC with version 5 software, a new project is created.
The recording settings of the sound card may operate according to the following parameters: 11KHz, 8bit, mono, PCM.
b. A timer object is placed on the default form that appears in the new item. This timer object is called "timer 1".
c. An MCI multimedia control object is placed on the form. This object is called "mmcontrol 1".
d. On this form 5 label objects are placed. These labels are known as label1, label2, label3, label4, and label 6.
e. 4 arrays of labels are created on the form. These arrays were renamed: SPT (0..4), SPJ (0..4), JQ (0..4), AVJ (0.. 4).
f. A command button is placed on the form and its title attribute is changed to end. This command button is called "command 1".
g. The code for this form is generated by typing the pages of text below "form 1" in appendix a.
h. A module is added to the project. The code for this module is generated by typing the contents of the pages below "Feelings-detector" in appendix A.
i. The microphone is connected to the PC.
j. Press (F5) or "run" to launch the application.
FIG. 9 is a diagram illustrating a screen display of a form prior to launching the application of appendix A in a design mode.
FIG. 10 is a schematic diagram depicting a screen display of a form during calibration of a particular object in a run mode.
FIG. 11 is a diagram depicting a screen display of a form during testing of an object in a run mode.
The values of the CoR-msgX variable in appendix A are as follows:
1-true, 2-sarcasm, 3-excitement, 4-confusion/irreconcilance, 5-hyperexcitement, 6-voice controlled, 7-lie/false statement, 8-exaggeration/inaccuracy.
Those variables that carry data for the current key part have names that start with the following characters: and (7) cor.
The name of the baseline factor starts with the following characters: cal _.
The name of the breakpoint factor starts with the following characters: bp _.
ResSPT and resSPJ are referred to as ResT and ResJ, respectively.
Fig. 12 is a simplified functional block diagram of a system for detecting emotional states, constructed and operative in accordance with a preferred embodiment of the present invention, for performing the method of fig. 1B. As shown, the system of fig. 12 includes a voice input device, such as tape recorder 700, microphone 710, or telephone 720, that produces speech that is input by emotion detection workstation 735 through a/D converter 740. The voice window recorder 750 typically divides the incoming speech representative signal into a plurality of voice windows or segments, which are analyzed by the voice window analyzer 760. The voice window analyzer compares these voice windows or segments to calibration data stored in unit 770. As detailed above, the calibration data is typically derived for each object individually. A display unit or printer 780 is provided to display or print the emotional status report, preferably in an online manner, for the user of the system.
It will be appreciated that the software portion of the invention may be implemented in ROM (read only memory) form, if desired. The software portions of the present invention can generally be implemented in hardware using conventional techniques, if desired.
It is to be understood that the specific embodiments described in the appendix are intended to provide an extremely detailed disclosure of the invention only and are not intended to limit the invention.
It is appreciated that various features of the invention which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination.
It will be appreciated by persons skilled in the art that the present invention is not limited by what has been particularly shown and described hereinabove. The scope of the invention should be limited only by the attached claims.
Appendix A
The following code should be written to the form object:
Form1 Private Sub Command1_Click() End End Sub Private Sub Form_Load() ′Set properties needed by MCI to open. a=mciSendString("setaudio waveaudio algorithn pcm bitspersample to 8_ bytespersec to 11025 input volume to 100 source to avarage",0,0,0) MMControl1.Notify=False MMControl1.Wait=True MMControl1.Shareable=False MMControl1.DeviceTyPe="WaveAudio" MMControl1.filename="C:\buf.WAV" ′Open the MCI WaveAudio device. MMControl1.Command="Open" ′Define constants CR_BGlevel=15 ′Background level barrier CR_BGfilter=3 ′Vocal wave smoother CR_DATAstr="" ′Reset Data String CR_mode=1 CONS_SARK=50 CONS_LIE11=130:CONS_LIE12=175 CONS_LowzzT=-0.4:CONS_HighzzT=0.3 CONS_LowzzJ=0.2:CONS_HighzzJ=0.7 CONS_RES_SPT=2:CONS_RES_SPJ=2 <!-- SIPO <DP n="18"> --> <dp n="d18"/> CONS_BGfiler=3 'Set timer object to work every 0.5 sec Timer1.Interval=500 Timer1.Enabled=True 'set display Label1.Caption="System decision" Label2.Caption="Global stress:" Label3.Caption="Excitement:" Label4.Caption="Lie stress:" MMControl1.Visible=False End Sub Private Sub Timer1_Timer() Static been On Error Resume Next MMControl1.Command="stop″ MMControl1.Command="save" MMControl1.Command="close" 'read data from file ff=MMControl1.filename Dim kk As String*6500 kk=Space(6500) Open ff For Binary Access Rea.d As#1 Get#1,50,kk Close#1 Kill ff MMControl1.Command="open" <!-- SIPO <DP n="19"> --> <dp n="d19"/> a=MMControl1.ErrorMessage MMControl1.Command="record" CR_DATAstr=kk If OP_stat=0 Then OP_stat=1 ′first round or after recalibration demand been=0 End If If been<5 Then Label1.Caption="Calibrating.." Call Calibrate ′Perform calibration ′get calibration status by CR_msgX If CoR_msgX>-1 Then ′good sample been=been+1 End If EXit Sub Else OP_stat=2 ′Checking status Call CHECK ′get segment status by CR_msgX End If If CoR_msgX<0 Then Exit Sub ′not enogh good samples Label4.Caption="Lie stress.:"+Format(Int(CR_LIE)) Label2.Caption="Global stress:"+Format(Int(CR_STRESS)) Label3.Caption="Excite Rate:"+Format(Int(CR_EXCITE)) Label6.Caption="Thinking Rate:"+Format(Int(CR_THINK)) been=been+1 Select Case CoR_msgX <!-- SIPO <DP n="20"> --> <dp n="d20"/> Case 0 ans="background noise" Case 1 ans="TRUE" Case 2 ans="Outsmart" Case 3 ans="Excitement" Case 4 ans="Uncertainty" Case 5 ans="High excitement" Case 6 ans="Voice manipulation/Avoidance/Emphasizing" Case 7 ans="LIE" Case 8 ans="Inaccuracy" End Select Label1.Caption=ans End Sub Sub Calibrate() Call CUT_sec If CR_noSMP<800 Then ′no samples CoR_msgX=-1 Exit Sub End If <!-- SIPO <DP n="21"> --> <dp n="d21"/> ′Scan thorns CONS_RES_SPT=2 CONS_RES_SPJ=2 Call scan_TJ If Int(CoR_spT)=0 Or Int(CoR_AVjump)=0 Or Int(CoR_QJUMP)=0 Or Int(CoR_SPJ)=0 Then CoR_msgX=-1 Exit Sub End If tot_T=0:tot_J=0:tot_JQ=0:tot_avj=0 ninspT=1000:minspJ=1000:minJQ=1000 For a=0 To 4 If SPT(a).Caption=0 And SPJ(a).Caption=0 Then SPT(a).Caption=Int(CoR_spT) SPJ(a).Caption=Int(CoR_SPJ) JQ(a).Caption=Int(CoR_QJUMP) AVJ(a).Caption=Int(CoR_AVjump) EXit For End If tot_T=tot_T+SPT(a) tot_J=tot_J+SPJ(a) tot_JQ=tot_JQ+JQ(a) tot_avj=tot_avj+AVJ(a) If Val(SPT(a).Caption)<minspT Then minspT=Val(SPT(a).Caption) If Val(SPT(a).Caption)>maxspT Then maxspT=Val(SPT(a).Caption) If Val(SPJ(a).Caption)<minspJ Then minspJ=Val(SPJ(a).Caption) If Val(SPJ(a).Caption)>maxspJ Then maxspJ=Val(SPJ(a).Caption) If Val(JQ(a).Caption)<minJQ Then minJQ=Val(JQ(a).Caption) <!-- SIPO <DP n="22"> --> <dp n="d22"/> If Val(JQ(a).Caption)>maxJQ Then maxJQ=Val(JQ(a).Caption) Next a ′calc current CAL factors CAL_spT=(tot_T+Int(CoR_spT))/(a+1) CAL_spJ=(tot_J+Int(CoR_SPJ))/(a+1) CAL_JQ=(tot_JQ+Int(CoR_QJUMP))/(a+1)<br/> CAL_AVJ=(tot_avj+Int(CoR_AVjump))/(a+1) ′calc resolution per factor On Error Resume Next If a>1 Then res_T=maxspT/minspT res_J=maxspJ/minspJ End If CoR_msgX=0 End Sub Sub CHECK() Call CUT_sec If CR_noSMP<800 Then ′no samples CoR_msgX=-1 Exit Sub End If CONS_RES_SPT=2 CONS_RES_SPJ=2 Call scan_TJ <!-- SIPO <DP n="23"> --> <dp n="d23"/> If Int(CoR_spT)=0 Or Int(CoR_AVjump)=0 Or Int(CoR_QJUMP)=0 Or Int(CoR_SPJ)=0 Then CoR_msgX=-1 Exit Sub End If Call analyze Call decision ′Fine tune cal factors CAL_spT=((CAL_spT*6)+CoR_spT)\7 CAL_spJ=((CAL_spJ*6)+CoR_SPJ)\7 CAL_JQ=((CAL_JQ*9)+CoR_QJUMP)\10 CAL_AVJ=((CAL_AVJ*9)+CoR_AVjump)/10 End Sub
the following code should be written to a new module object:
Feelings detector ′Declaration section Global Fname ′-file name Global CR_BGfilter ′-BackGround Filter Global CR_BGlevel ′-BackGround level Global CR_DATAstr Global CR_noSMP ′-nomber of samples Global res_J,res_T Global CoR_spT,CoR_SPJ,CoR_AVjump,CoR_QJUMP Global CoR_msgX,CR_retD3ATAstr Global SMP(10000)As Integer <!-- SIPO <DP n="24"> --> <dp n="d24"/> Global OP_stat ′**Calibration factors Global CAL_spJ,CAL_spT Global CAL_JQ,CAL_AVJ Global BP_J,BP_T ′-CALIBRATION break points Global WI_J,WI_T,WI_JQ ′-Wigth of factors in calc. Global CR_zzT,CR_zzJ Global CR_STRESS,CR_LIE,CR_EXCITE,CR_THINK Global CR_RESfilter ′-resolution filter ′Constants for decision Global CONS_SARK Global CONS_LIE11,CONS_LIE12 Global CONS_LowzzT,CONS_HighzzT Global CONS_LowzzJ,CONS_HighzzJ Global CONS_RES_SPT,CONS_RES_SPJ Declare Function mciSendString Lib"winmm.dll"Alias"mciSendStringA"(ByVal lpstrCommand As String,ByVal lpstrReturnString As String,ByVal uRetumLength As Long,ByVal hwndCallback As Long)As Long Sub analyze() On Error Resume Next CR_LIE=0 CR_STRESS=0 CR_EXCITE=0 If(CoR_spT=0 And CoR_SPJ=0)Or CR_noSMP=0 Then CR_msg="ERROR" Exit Sub End If <!-- SIPO <DP n="25"> --> <dp n="d25"/> If CoR_SPJ=0 Then CoR_SPJ=1 If CoR_spT=0 Then CoR_spT=1 On Error Resume Next rrJ=res_J:rrT=res_T BP_J=1.1:BP_T=1.1 zz_spj=(((CAL_spJ/Int(CoR_SPJ))-BP_J)/rrJ) If zz_spj>-0.05 And zz_spj<0.05 Then res_J=res_J-0.1 If res_J<1.3 Then res_J=1.3 If zz_spj<-0.6 Then zz_spj=-0.6 res_J=res_J+0.1 End If If zz_spj>1.2 Then zz_spj=1.2 res_J=res_J+0.1 End If If res_J>3.3 Then res_J=3.3 CR_zzJ=zz_spj zz_spT=(((CAL_spT/CoR_spT)-BP_T)/rrT) CR_zzT=zz_spT If zz_spT>-0.05 And zz_spT<0.05 Then res_T=res_T-0.1 If res_T<1.3 Then res_T=1.3 If zz_spT<-0.6 Then zz_spT=-0.6 res_T=res_T+0.1 End If If zz_spT>1.2 Then <!-- SIPO <DP n="26"> --> <dp n="d26"/> zz_spT=1.2 res_T=res_T+0.1 End If If res_T>3.3 Then res_T=3.3 WI_J=6:WI_T=4 CR_STRESS=Int((CoR_QJUMP/CAL_JQ)*100) ggwi=WI_J*WI_T CR_LIE=((zz_spT+1)*WI_T)*((zz_spj+1)*WI_J) CR_LIE=((CR_LIE/ggwi))*100 CR_LIE=CR_LIE+Int((CoR_QJUMP-CAL_JQ)*1.5) CR_THINK=Int((CoR_AVjump/CAL_AVJ)*100) CR_EXCITE=((((((CR_zzT)/2)+1)*100)*9)+CR_STRESS)/10 ′*********END OF Phase2-****** If CR_LIE>210 Then CR_LIE=210 If CR_EXCITE>250 Then CR_EXCITE=250 If CR_STRESS>300 Then CR_STRESS=300 If CR_LIE<30 Then CR_LIE=30 If CR_EXCITE<30 Then CR_EXCITE=30 If CR_STRESS<30 Then CR_STRESS=30 End Sub Sub CUT_sec() CR_noSMP=0 If CR_DATAstr=""Then CR_msg="ERROR!-No data provided" Exit Sub End If CR_AUTOvol=1 ′Auto amplifier <!-- SIPO <DP n="27"> --> <dp n="d27"/> CoR_volume=3 ′default CR_minSMP=800 ′default free=FreeFile ′Break CR_DATAstr to bytes LocA=1:LocB=1 BGAmin=0 BGAmax=0 MAXvolume=0 TestP=0 BR_LOW=-128 BR_high=-128 ddd=-128 ddd=Int(ddd*(CoR_volume/3)) ddd=(ddd\CR_BGfilter)*CR_BGfilter If CR_AUTOvol=1 Then ′apply auto volume detect MAXvolume=0 For a=1 To Len(CR_DATAstr) ccc=Asc(Mid$(CR_DATAstr,a,1)) ccc=ccc-128 ccc=(ccc\CR_BGfilter)*CR_BGfilter If(ccc>CR_BGlevel Or ccc<0-CR_BGlevel)And ccc>ddd Then If Abs(ccc)>MAXvolume Then MAXvolume=Abs(ccc) If StartPos=0 Then StartPos=a OKsmp=OKsmp+1 End If If MAXvolume>110 Then Exit For Next a If OKsmp<10 Then CR_msg="Not enough samples!" <!-- SIPO <DP n="28"> --> <dp n="d28"/> CR_noSMP=0 Exit Sub End If CoR_volume=Int(360/MAXvolume) If CoR_volume>16 Then CoR_volume=3 End If On Error Resume Next drect="":DR_flag=0 MAXvolume=0 LocA=0 Done=0 89 For a=StartPos To Len(CR_DATAstr)-1 ccc=Asc(Mid$(CR_DATAstr,a,1)):ccd=Asc(Mid$(CR_DATAstr,a+1,1)) ccc=ccc-128:ccd=ccd-128 ccc=Int(ccc*(CoR_volume/3)) ccd=Int(ccd*(CoR_volume/3)) ccc=(ccc\CR_BGfilter)*CR_BGfiter ccd=(ccd\CR_BGfiter)*CR_BGfilter If(ccc>CR_BGlevel Or ccc<0-CR_BGlevel)And ccc>ddd Then If Abs(ccc)>MAXvolume Then MAXvolume=Abs(ccc) fl=fl+1 End If If fl>5 Then SMP(LocA)=ccc If BR_high<ccc Then BR_high=ccc If BR_LOW>ccc Or BR_LOW=-128 Then BR_LOW=ccc If(SMP(LocA)>0-CR_BGlevel And SMP(LocA)<CR_BGlevel)Or SMP(LocA)=ddd Then blnk=blnk+1 <!-- SIPO <DP n="29"> --> <dp n="d29"/> Else blnk=0 End If If blnk>1000 Then LocA=LocA-700 Done=1 If LocA>CR_minSMP Then Exit For Done=0 LocA=0 fl=2:blnk=0 BR_LOW=-128:BR_high=-128 End If LocA=LocA+1 End If Next a Err=0 CR_noSMP=LocA If CR_noSMP<CR_minSMP Then CR_msg="Not enough samples!" Exit Sub End If CR_msg="Completed O.K." End Sub Sub decision() If CR_zzT=0 And CR_zzJ=0 And(CL_spJ◇Int(CoR_SPJ))Then CR_msg="ERROR!-Required parameters missing!" Exit Sub <!-- SIPO <DP n="30"> --> <dp n="d30"/> End If If CR_STRESS=0 Or CR_LIE=0 Or CR_EXCITE=0 Then CR_msg="ERROR!-Required calculations missing!" Exit Sub End If CR_msgCode=0 CoR_msgX=0 sark=0 If CR_LIE<60 Then CoR_msgX=2 Exit Sub End If 5555 If((CR_zzJ+1)*100)<65 Then If((CR_zzJ+1)*100)<50 Then sark=1 CR_zzJ=0.1 End If If((CR_zzT+1)*100)<65 Then If((CR_zzT+1)*100)<CONS_SARK Then sark=sark+1 CR_zzT=0.1 End If LIE_BORD1=CONS_LIE11:LIE_BORD2=CONS_LIE12 If CR_LIE<LIE_BORD1 And CR_STRESS<LIE_BORD1 Then CR_msgCode=CR_msgCode+1 End If If CR_LIE>LIE_BORD1 And CR_LIE<LIE_BORD2 Then CoR_msgX=8 Exit Sub <!-- SIPO <DP n="31"> --> <dp n="d31"/> End If If CR_LIE>LIE_BORD2 Then If CR_msgCode<128 Then CR_msgCode=CR_msgCode+128 End If If CR_zzJ>CONS_LowzzJ Then If CR_zzJ>CONS_HighzJ Then CR_msgCode=CR_msgCode+64 Else CR_msgCode=CR_msgCode+8 End If End If If CR_EXCITE>LIE_BORD1 Then If CR_EXCITE>LIE_BORD2 Then If(CR_msgCode And 32)=False Then CR_msgCode=CR_msgCode+32 Else If(CR_msgCode And 4)=False Then CR_msgCode=CR_msgCode+4 End If End If If CR_msgCode<3 Then If sark=2 Then CR_msgCode=-2 CoR_msgX=2 Exit Sub End If If sark=1 Then If(CR_zzT>CONS_LowzzT And CR_zzT<CONS_HighzzT)Then CR_msgCode=-1 <!-- SIPO <DP n="32"> --> <dp n="d32"/> CoR_msgX=2 Else If CR_zzT>CONS_HighzzT Then CoR_msgX=7 End If If(CR_zzJ>CONS_LowzzT And CR_zzJ<CONS_HighzzT)Then CR_msgCode=-1 CoR_msgX=2 Else If CR_zzJ>CONS_HighzzT Then CoR_msgX=7 End If Exit Sub End If CR_msgCode=1 CoR_msgX=1 Exit Sub End If If CR_msgCode>127 Then CoR_msgX=7 Exit Sub End If If CR_msgCode>67 Then CoR_msgX=8 Exit Sub End If If CR_msgCode>63 Then CoR_msgX=6 Exit Sub End If If CR_msgCode>31 Then CoR_msgX=5 <!-- SIPO <DP n="33"> --> <dp n="d33"/> Exit Sub End If If CR_msgCode>7 Then CoR_msgX=4 Exit Sub End If If CR_msgCode>3 Then CoR_msgX=3 Exit Sub End If CoR_msgX=1 End Sub Sub scan_TJ() ReDim jjump(100) CR_msg="" TestP=CR_noSMP CR_spT=0 CR_SPJ=0 If TestP<=0 Then CR_msg="No.of samples not transmitted!" Exit Sub End If CR_minJUMP=3 ′default CR_maxJUMP=20 ′default jump=0 thorns=0 BIGthorns=0 For a=1 To CR_noSMP jjt1=SMP(a):jjt2=SMP(a+1):jjt3=SMP(a+2) <!-- SIPO <DP n="34"> --> <dp n="d34"/> ′scan thorns If(jjt1<jjt2 And jjt2>jjt3)Then If jjt1>15 And jjt2>15 And jjt3>15 Then thorns=thorns+1 End If If(jjt1>jjt2 And jjt2<jjt3)Then If jjt1<-15 And jjt2<-15 And jjt3<-15 Then thoms=thorns+1 End If If(jjt1>jjt2-5)And(jjt1<jjt2+5)And(jjt3>jjt2-5)And(jjt3<jjt2+5)Then sss=sss+1 Else If sss>=CR_minJUMP And sss<=CR_maxJUMP Then jump=jump+1 jjump(sss)=jjump(sss)+1 End If sss=0 End If Next a AVjump=0 JUMPtot=0 CR_QJUMP=0 For a=1 To 100 JUMPtot=JUMPtot+jjump(a) AVjump=AVjump+(jjump(a)*a) Next a If JUMPtot>0 Then cr_AVjump=AVjump/JUMPtot For a=1 To 100 If jjump(a)>1 Then QJUMP=QJUMP+((jjump(a)*Abs(cr_AVjump-a)))′* jjump(a)) <!-- SIPO <DP n="35"> --> <dp n="d35"/> Next a CoR_spT=(Int(((thorns)/CR_noSMP)*1000)/CONS_RES_SPT) CoR_SPJ=(Int(((jump)/CR_noSMP)*1000)/CONS_RES_SPJ) CoR_QJUMP=Sqr(QJUMP) CoR_AVjump=cr_AVjump CR_msg="Thorns & Jumps scan completed O.K" End Sub
Claims (19)
1. A method for detecting an emotional state of an individual, the method comprising:
receiving speech samples generated by an individual and deriving therefrom intonation information; and
generating an indication of an emotional state of the individual based on the intonation information,
wherein the intonation information comprises spike-related information, the generating step comprising calculating a excitement level of the individual in a current key portion defined in at least one current segment based on the spike-related information, and generating an indication of the excitement level.
2. The method of claim 1, wherein the voice sample is provided over a telephone.
3. The method according to claim 1, wherein said report on the emotional state of the individual comprises a lie detection report based on the emotional state of the individual.
4. The method according to claim 1, wherein said intonation information comprises multidimensional intonation information.
5. The method of claim 4, wherein the multi-dimensional information comprises at least 3-dimensional information.
6. The method of claim 5, wherein the multi-dimensional information comprises at least 4-dimensional information.
7. The method of claim 1, wherein the information about the spikes comprises a number of spikes in a predetermined period of time.
8. The method of claim 7, wherein the information about the spikes comprises a distribution of the spikes over time.
9. The method according to claim 1, wherein said intonation information comprises information about a platform.
10. The method of claim 9, wherein the information about the plateaus comprises a number of plateaus in a predetermined period of time.
11. The method according to claim 1, wherein said intonation information is associated with the length of each platform, respectively, and comprises an average of said platform lengths over a predetermined period of time.
12. The method according to claim 1, wherein said intonation information is associated with the length of each platform, respectively, and includes a standard error of the length of each platform over a predetermined period of time.
13. The method of claim 1, wherein the speech sample comprises a main speech waveform having a period, the receiving step comprising analyzing the speech sample to determine the incidence of plateaus, each plateau indicating a local low frequency waveform superimposed on the main speech waveform; and
the generating step includes providing an appropriate indication based on the incidence of the platform.
14. The method according to claim 1, wherein
The receiving step comprises quantifying a plurality of features of a speech sample generated by the individual; and
the generating step includes generating a lie detection indication based on the plurality of quantized features.
15. The method according to claim 1, wherein
The receiving step includes establishing a multi-dimensional feature range characterized by a range of emotions of the individual when calm, as follows:
monitoring the individual for a plurality of mood-related parameters during a first time that the individual is in a mood-neutral state; and
defining the characteristic range as a function of the range of mood-related parameters in the first time,
wherein the generating step comprises monitoring the individual for the mood-related parameter during a second time when the emotional state of the individual is to be detected, thereby obtaining a measure of the plurality of mood-related parameters, and adjusting the measure to take into account the range.
16. The method according to claim 1, wherein said intonation information further comprises information about platform length.
17. The method of claim 14, wherein the excitation level comprises a decreasing function of (a) a number of spikes in at least a portion of the speech sample and (b) a deviation of a length of a plateau in the portion.
18. The method of claim 1, wherein the excitement level in the key portions is calculated as a function of a ratio between calSPT and SPT values of the current segment, calSPT representing an average of m SPT values in m calibration segments, m representing a number of speech recording segments of a subject for calibration for which a profile of a real/neutral emotional state is established, each SPT value comprising a function of a ratio between a number of spikes detected in each key portion of a segment and a number of samples n contained in each key portion.
19. The method of claim 1, wherein the excitement level in the critical section is calculated as a sum of (a) a function of a ratio between calSPT and SPT values of the current segment, and (b) a function of a ratio between calJQ and JQ values of the current segment,
calSPT represents the average of the m SPT values in m calibration segments, m representing the number of speech recording segments of the subject for calibration, for which a profile of true/neutral emotional state is established,
each SPT value comprises a function of the ratio between the number of spikes detected in each critical portion of a segment and the number of samples n contained in each critical portion;
calJQ represents the average of m JQ values in m calibration segments; and
each JQ value represents the square root of the plateau bias in each critical portion of a segment.
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| IL12263297A IL122632A0 (en) | 1997-12-16 | 1997-12-16 | Apparatus and methods for detecting emotions |
| IL122632 | 1997-12-16 | ||
| PCT/IL1998/000613 WO1999031653A1 (en) | 1997-12-16 | 1998-12-16 | Apparatus and methods for detecting emotions |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1034796A1 HK1034796A1 (en) | 2001-11-02 |
| HK1034796B true HK1034796B (en) | 2005-04-29 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN1174373C (en) | Method for detecting emotions | |
| CN1152365C (en) | Apparatus and method for pitch tracking | |
| Dubnov | Generalization of spectral flatness measure for non-gaussian linear processes | |
| Riede et al. | Nonlinear acoustics in the pant hoots of common chimpanzees (Pan troglodytes): vocalizing at the edge | |
| CN1248190C (en) | Fast frequency-domain pitch estimation | |
| CN1311422C (en) | Voice recognition estimating apparatus and method | |
| EP1222448B1 (en) | System, method, and article of manufacture for detecting emotion in voice signals by utilizing statistics for voice signal parameters | |
| CN101411171B (en) | Non-intrusive signal quality assessment | |
| US7353167B2 (en) | Translating a voice signal into an output representation of discrete tones | |
| US7490038B2 (en) | Speech recognition optimization tool | |
| US20060165239A1 (en) | Method for determining acoustic features of acoustic signals for the analysis of unknown acoustic signals and for modifying sound generation | |
| CN1308911C (en) | Method and system for identifying status of speaker | |
| CN1910651A (en) | System for detection section including particular acoustic signal, method and program thereof | |
| CN118197303B (en) | Intelligent speech recognition and sentiment analysis system and method | |
| CN1282952A (en) | Speech coding method and device, input signal discrimination method, speech decoding method and device and progrom providing medium | |
| CN1193159A (en) | Speech encoding and decoding method and apparatus, telphone set, tone changing method and medium | |
| US20080281599A1 (en) | Processing audio data | |
| CN1795491A (en) | Method for analyzing fundamental frequency information and voice conversion method and system implementing said analysis method | |
| JP5151102B2 (en) | Voice authentication apparatus, voice authentication method and program | |
| JPH08286693A (en) | Information processing device | |
| Rosenzweig et al. | Computer-assisted analysis of field recordings: A case study of Georgian funeral songs | |
| JP4654621B2 (en) | Voice processing apparatus and program | |
| CN101030374A (en) | Method and apparatus for extracting base sound period | |
| HK1034796B (en) | Methods for detecting emotions | |
| CN1967657A (en) | System and method for automatic tracking and transposition of speaker's voice in program production |