TWI394141B

TWI394141B - Karaoke song accompaniment automatic scoring method

Info

Publication number: TWI394141B
Application number: TW098106930A
Authority: TW
Inventors: Wen Hsin Lin
Original assignee: Wen Hsin Lin
Priority date: 2009-03-04
Filing date: 2009-03-04
Publication date: 2013-04-21
Also published as: TW201034000A

Description

Karaoke song accompaniment automatic scoring method

本發明係涉及一種卡拉OK歌曲伴唱自動評分方法，特別是指一種依據音感、節奏感及情感等多項分數，再以加權計分方式核算評分之創新設計者。 The invention relates to an automatic scoring method for karaoke song accompaniment, in particular to an innovative designer who calculates scores by weighted scoring method based on multiple scores such as sensation of sound, rhythm and emotion.

按，在卡拉OK(KARAOK)歌曲伴唱過程中，目前的伴唱機通常伴有自動評分的功能，但是，此種功能的習知設計，往往只是粗略估算整體分數而已，也可能只是依據唱歌聲音的分貝數值高低來作為評量的唯一參考，而某些伴唱機的評分結果，甚至與歌曲唱的好壞品質狀態其實沒什麼關連性，如此只能達到些許的娛樂效果而已，並不能真正的評出歌曲唱的好壞，因此對於歌唱者的練唱而言，其實並無法有所幫助。 Press, in the karaoke (KARAOK) song accompaniment process, the current phonograph is usually accompanied by automatic scoring function, but the conventional design of this function is often only a rough estimate of the overall score, or may be based solely on the singing voice The decibel value is the only reference for the evaluation, and the scores of some phonographs are not related to the quality of the songs. Therefore, only a little entertainment effect can be achieved, and it cannot be truly judged. The songs are good or bad, so it doesn't really help the singer's practice.

是以，針對上述習知卡拉OK歌曲伴唱產品設計使用上所存在之問題點，如何研發出一種能夠更具理想實用性之創新設計，實有待相關業界再加以思索突破之目標及方向者。 Therefore, in view of the problems existing in the design and use of the above-mentioned karaoke song singer products, how to develop an innovative design that can be more ideal and practical, and the relevant industry should further consider the goal and direction of breakthrough.

有鑑於此，發明人本於多年從事相關產品之製造開發與設計經驗，針對上述之目標，詳加設計與審慎評估後，終得一確具實用性之本發明。 In view of this, the inventor has been engaged in the manufacturing development and design experience of related products for many years. After detailed design and careful evaluation, the inventor has finally obtained the practical invention.

本發明之主要目的，係在提供一種卡拉OK歌曲伴唱自動評分方法，其所欲解決之問題點，係針對習知卡拉OK歌曲伴唱機之自動評分功能並不能真正評出歌唱好壞，以致對於歌唱者練唱而言並無所助益之問題點加以思索突破；本發明解決問題之技術特點，在於所述卡拉OK歌曲伴唱自動評分方法，主要是藉由比對唱歌者的音高、拍點位置及音量與歌曲主旋律的音高、拍點位置及音量，分別得到音感分數、節奏感分數及情感分數，最後以加權計分方式核算加權總分；藉此創新獨特設計，使本發明對照先前技術而言，可以精確計算出演唱者在每一個歌曲段落的音高、拍點位置及音量誤差，並可利用音高曲線、音量曲線的顯示效果，讓演唱者可以很容易知道哪個地方唱得不夠準確以及哪個地方需要加強，同時具有教學及娛樂之雙重效果而確具實用進步性。 The main object of the present invention is to provide an automatic scoring method for karaoke song accompaniment, and the problem to be solved is that the automatic scoring function of the conventional karaoke song phonograph can not really judge the singing quality, so that The singer does not have the problem of helping to sing and think about the breakthrough. The technical feature of the problem solving of the present invention lies in the automatic scoring method of the karaoke song accompaniment, mainly by comparing the pitch and beat of the singer. The position and volume and the pitch, beat position and volume of the main melody of the song are respectively obtained into the pitch score, the rhythm score and the emotional score, and finally the weighted total score is calculated by weighted scoring method; thereby, the innovative unique design makes the present invention compare the previous In terms of technology, the pitch, beat position and volume error of the singer in each song passage can be accurately calculated, and the display effect of the pitch curve and the volume curve can be utilized, so that the singer can easily know which place sings. It is not accurate enough and where it needs to be strengthened. It has the dual effects of teaching and entertainment and is practical and progressive.

請參閱第1~16圖所示，係本發明卡拉OK歌曲伴唱自動評分方法之較佳實施例，惟此等實施例僅供說明之用，在專利申請上並不受此結構之限制；所述卡拉OK歌曲伴唱自動評分方法，大致而言，主要是藉由比對唱歌者的音高、拍點位置及音量與歌曲主旋律的音高、拍點位置及音量的方式，以分別得到音感分數、節奏感分數及情感分數之計分項目，最後以加權計分方式核算該等計分項目之加權總分，以獲得自動評分之分數者。 Please refer to the first embodiment of the present invention, which is a preferred embodiment of the automatic karaoke song singer scoring method of the present invention, but the embodiments are for illustrative purposes only, and are not limited by the structure in the patent application; The karaoke song accompaniment automatic scoring method is generally obtained by comparing the pitch of the singer, the position and volume of the beat with the pitch of the main melody of the song, the position of the beat and the volume, respectively. The scoring items of the rhythm score and the emotion score are finally calculated by weighted scoring method to calculate the weighted total score of the scoring items to obtain the score of the automatic scoring.

當一個人在唱歌時，除了個人聲音的特質外，要評論其歌聲與歌曲的匹配，主要應包括三種感覺，一是音感、二是節奏感、三是情感，音感是判斷其音高與相對之每個音符的音高準確度；節奏感是判斷其拍點位置的誤差，包括起唱拍點及結束拍點；情感是判斷其音量的變化，包括每句的音量變化及整體的音量變化。而具體獲取所述音感分數、節奏感分數及情感分數之方法分別說明如下： When a person is singing, in addition to the characteristics of personal voice, to comment The matching of songs and songs should mainly include three kinds of feelings, one is the sense of sound, the other is the sense of rhythm, the third is emotion, the sense of sound is the pitch accuracy of each pitch and the relative note; the sense of rhythm is to judge its beat The error of the point position includes the vocal beat point and the end beat point; the emotion is to judge the change of the volume, including the volume change of each sentence and the overall volume change. The methods for specifically obtaining the pitch score, the rhythm score, and the emotion score are respectively described as follows:

(1) Sound score:

請參考第1圖所示，每隔一小段時間(例如0.1秒)，由演唱者所唱之麥克風音訊，計算一次演唱者的音高，此音高估算，是取得人聲音訊的基頻(Fundamental Frequency)，而其取得方法通常可利用基於自相關函數(Autocorrelation Function)的方法取得，然後，將此基頻經由音感估算器先轉換成相對之音階，接著比對此人聲音階與音樂主旋律中所擷取到的音階之匹配程度，並給予該音階一音感分數，如此計算所有音階之音感分數，直到演唱結束，即可輸出一平均音感分數。如第2圖所示，其具體說明如下：首先是“初始參數設定”，其中初始化了的音階個數m=0、及人聲與該音階之高音感匹配值NoteHit=0，和低音感匹配值NoteHitAround=0，NoteHit表示該音階演奏期間，人聲音高與之完全匹配的時間段數，NoteHitAround則表示人聲音高與音樂音階相差在一個半音之內的時間段數，接著取得下一段時間的主旋律音階及計算一段時間的人聲音高，又計算音階的音感分數算法，係由高音感匹配值NoteHit、低音感音階匹配值NoteHitAround及音階長度NoteLength來決定；主旋律音階是由midi等文件中直接取得的，依時間的增加取得其相對於該時間的演奏音階，人聲音高(基頻)，可經由轉碼表轉換得到相對於該音高的音階，例如音階“A4”的頻率是440 Hz，每提高八度音，頻率增加兩倍，如音階“A5”的頻率是880 Hz，一個八度有12個半音，每個半音間的頻率相差2^(1/12)倍，因為若人聲與該音階的頻率相差2倍或1/2倍等，整數的倍數關係時，其音感是相同的，因此透過音階±12個半音，我們調整了計算所得到的人聲音階Note_p與主旋律的音階Note_{_m}，令其相差在+6個半音與-5個半音之間，即Note_p=Note_p+12*i,i是非0的整數，使得-5<=Note_p-Note_m<=6。接著，判斷是否為新的音階，若是則計算上個音階的音感分數，然後重新設置起始參數，NoteHit=0且NoteHitAround=0及音階個數m=m+1，若否則比較是否主旋律音階與人聲音階匹配，此匹配指的是，誤差在一個比較小的容許的範圍內，如0.5個半音以內，若匹配則增加該音階之高音感匹配值NoteHit=NoteHit+1，否則判斷是否主旋律音階與人聲音階為低音感匹配，此低音感匹配表示，誤差在一個比較大的容許的範圍內，如相差一個半音以內，若是則增加音階低音感匹配值NoteHitAround=NoteHitAround+1，接著回到取得下一段時間的主旋律音階及計算人聲音高。上述“計算上個音階的音感分數”，其算法如第3圖所示：先取得前一音樂主弦律音階長度NoteLength(m)，其中：m=0、1、2、...、M Please refer to Figure 1 for a short period of time (for example, 0.1 seconds). The pitch of the singer is calculated by the microphone audio sung by the singer. This pitch estimation is the fundamental frequency of the person's voice. Fundamental Frequency), and its acquisition method can usually be obtained by the method based on Autocorrelation Function. Then, the fundamental frequency is first converted into a relative scale by the sound estimator, and then the sound level and the music main melody are compared. The degree of matching of the scales captured in the middle, and giving the scale a pitch score, thus calculating the pitch scores of all scales, until the end of the concert, an average pitch score can be output. As shown in Fig. 2, the details are as follows: First, "initial parameter setting", in which the number of scales initialized is m = 0, and the treble and the pitch of the scale are matched with NoteHit = 0, and the bass sense is matched. NoteHitAround=0, NoteHit indicates the number of time periods during which the human voice is highly matched during the performance of the scale, and NoteHitAround indicates the number of time periods when the human voice is higher than the musical scale within one semitone, and then the main melody of the next period is obtained. The scale and the calculation of the sound of a person with a high sound for a period of time, and the calculation of the pitch score algorithm of the scale are determined by the high-pitched sound matching value NoteHit, the low-pitched sound level matching value NoteHitAround and the scale length NoteLength; the main melody scale is directly obtained from the midi and other documents. Obtaining the scale of the performance relative to the time according to the increase of time, the human voice is high (the fundamental frequency), and the scale relative to the pitch can be converted through the transcoding table, for example, the frequency of the scale "A4" is 440 Hz, each Raise the octave, the frequency is increased by two times, such as the frequency of the scale "A5" is 880 Hz, an octave has 12 semitones, the frequency phase between each semitone 2 ^(1/12) times, because if the difference between the frequency of the vocal scale twice or 1/2 times the like, when an integer multiple relationship, the sense of sound is the same, thus the scale through ± 12 semitones, we adjusted the calculation The obtained human sound level Note_p and the scale of the main melody Note _{_m} make it differ between +6 semitones and -5 semitones, that is, Note_p=Note_p+12*i, i is an integer other than 0, so that -5<= Note_p-Note_m<=6. Next, determine whether it is a new scale, if yes, calculate the pitch score of the previous scale, and then reset the starting parameters, NoteHit=0 and NoteHitAround=0 and the number of scales m=m+1, if otherwise, compare the main melody scale with Human sound level matching, this matching means that the error is within a relatively small allowable range, such as 0.5 semitones. If it matches, the high-pitched sound matching value of the scale is increased NoteHit=NoteHit+1, otherwise it is judged whether the main melody scale Matches the human sound level to the bass sense. This bass sense match indicates that the error is within a relatively large allowable range, such as within one semitone, if so, increase the scale bass match value NoteHitAround=NoteHitAround+1, and then return to get The main melody scale and the calculation of the human voice are high for a while. The above “calculating the pitch score of the previous scale”, the algorithm is as shown in Fig. 3: first obtain the previous music main string scale length NoteLength(m), where: m=0, 1, 2, ..., M

該M為音階總個數，然後判斷高音感匹配值NoteHit是否大於零，若是則計算高音感音階匹配分數： PitchScore(m)=PSH+K1 * NoteHit(m)/NoteLength(m)；其中：PSH，K1為可調整之經驗值參數，否則計算低音感音階匹配分數：PitchScore(m)=PSL+K2 * NoteHitAround(m)/NoteLength(m)；其中：PSL，K2為可調整之經驗值參數，並限制：0<=PitchScore(m)<=100 The M is the total number of scales, and then it is judged whether the high-pitched sound matching value NoteHit is greater than zero, and if so, the high-pitched scale matching score is calculated: PitchScore(m)=PSH+K1 * NoteHit(m)/NoteLength(m); where: PSH, K1 is an adjustable empirical value parameter, otherwise calculate the bass level scale matching score: PitchScore(m)=PSL+K2 * NoteHitAround (m)/NoteLength(m); where: PSL, K2 are adjustable empirical value parameters, and limited: 0<=PitchScore(m)<=100

最後判斷是否為最後一個音階，若否則重複上述流程，若是則“計算平均音感分數”，其算法為所有PitchScore(m)以音長NoteLength(m)為加權比重的加權平均，如下：令音階總長度NL=Σ_m=0~M-1 NoteLength(m)，平均音感分數SOP(Score of Pitch)： Finally, it is judged whether it is the last scale. If the above process is repeated, if it is, then “calculate the average pitch score”, the algorithm is the weighted average of all PitchScore(m) with the weighted NoteLength(m) as the weighted proportion, as follows: Degree NL=Σ _m=0~M-1 NoteLength(m), average score SOP(Score of Pitch):

(2) Rhythm score:

節奏感是計算人聲起唱拍點與該音樂主旋律音階的起奏時間及人聲結束拍點與該音樂主旋律音階的結束時間的匹配程度來決定。要準確的估算出歌唱者每個節拍的拍點位置，在此我們以估計歌唱者音高的變化，當做其演唱不同音階的時間變化，進而來判斷其節拍的準確度，如第4圖所示，其雷同第1圖所述方法，係先估算人聲的音高及取得音樂主旋律的音階，然後透過節奏感估算器產生平均節奏感分數。 The sense of rhythm is determined by calculating the degree of matching between the vocal beat point and the start time of the music main melody scale and the ending time of the vocal end beat point and the end time of the music main melody scale. To accurately estimate the position of the singer's beat at each beat, here we estimate the singer's pitch as a change in the time of the different scales, and then determine the accuracy of the beat, as shown in Figure 4. It is shown that the method described in Figure 1 first estimates the pitch of the human voice and The scale of the main melody of the music is obtained, and then the average rhythm score is generated by the rhythm estimator.

經由節奏感估算器，先將人聲音高轉成相對之音階，然後比對此音階，與主旋律中得到之音階在時間上的誤差，此時間的誤差包括提早或延遲的起奏拍點與結束拍點，並記錄每個音階的時間誤差，然後給予該音階之節奏感分數，如此計算所有的音階之節奏感分數，直到演唱結束，然後輸出平均節奏感分數。如第5圖所示，可利用節奏感延遲匹配器及節奏感超前匹配器，由轉換後之人聲音階、目前、上一個及下一個音樂主弦律音階，分別計算出人聲與該音階在時間上延遲或超前的匹配程度，得到人聲結束拍點或起唱拍點延遲時間及超前時間，再經由計算音階節奏感分數之手段，得到該音階的節奏感分數，依此，從第一個音階開始，我們計算每個音階的節奏感誤差，直到最後一個音階結束，然後計算平均節奏感分數。 Through the rhythm estimator, the human voice is first converted to a relative scale, and then the time is compared with the scale obtained from the main melody. The error of this time includes the early or delayed attack and end. The points are taken, and the time error of each scale is recorded, and then the rhythm score of the scale is given, so that the rhythm scores of all the scales are calculated until the end of the singing, and then the average rhythm score is output. As shown in Fig. 5, the rhythm delay matching device and the rhythm sense lead matcher can be used to calculate the vocal and the scale from the converted human sound level, the current, the previous and the next music main chord scale, respectively. The degree of delay in the time delay or advancement, the delay time and the lead time of the vocal end beat or the vocal beat, and then the rhythm score of the scale is obtained by calculating the rhythm sense score, according to which, from the first At the beginning of the scale, we calculate the rhythm error for each scale until the end of the last scale, and then calculate the average rhythm score.

請配合參看第6圖所示，該節奏感延遲匹配器是先判斷是否為新音樂音階的開始，若否則判斷是否已設定起唱拍點延遲時間，若是則結束，否則再判斷人聲音階與音樂音階是否匹配，若否則增加起唱拍點延遲時間，若是則設定起唱拍點延遲時間，然後結束，此延遲時間表示音樂音階開始後，人聲比它晚開始的時間誤差；若為新音樂音階的開始，則重設起唱拍點延遲時間並記錄上個音階結束時間，接著判斷人聲音階是否與上一個音樂主弦律音階匹配，若是則再判斷下一個人聲音階是否與上一個音樂主弦律音階匹配，直到否為止，然後設定結束拍點延遲時間後結束，此延遲時間表示該上個音樂音階結束後，人聲比它晚結束的時間誤差。 Please refer to Figure 6, the rhythm-sensing delay matcher first determines whether it is the beginning of a new music scale, otherwise it determines whether the slap beat delay time has been set, and if so, it ends, otherwise it judges the human sound level and Whether the music scale matches, if otherwise increase the slap beat delay time, if yes, set the slap beat delay time, and then end, this delay time indicates the time error of the vocal start later than the music scale starts; if it is new music At the beginning of the scale, the singer beat delay time is reset and the last scale end time is recorded, and then it is judged whether the human sound level matches the previous music main chord scale, and if so, whether the next personal sound level is the same as the previous music master. The scale scale matches until no, then sets the end beat delay time after the knot Bundle, this delay time indicates the time error of the vocal ending later than the end of the last musical scale.

請配合參看第7圖所示，該節奏感超前匹配器，則是先判斷是否為新音樂音階的開始，若否，則判斷人聲音階與目前音樂音階是否匹配，若是，則記錄人聲音階結束時間，否則設定結束拍點超前時間，然後結束，此超前時間表示該音樂音階結束前，人聲比它更早結束的時間誤差；若為新音樂音階的開始，則重設結束拍點超前時間並記錄該音階開始時間，接著判斷人聲音階是否與該音樂主弦律音階匹配，若是則再判斷上一個人聲音階是否與該音階匹配，直到否為止，然後設定起唱拍點超前時間後結束，此超前時間表示該音樂音階開始前，人聲比它更早開始的時間誤差。 Please refer to Figure 7, the rhythm sense lead matcher, first determine whether it is the beginning of the new music scale, if not, determine whether the human sound level matches the current music scale, and if so, record the human sound level End time, otherwise set the end beat time, then end, this lead time indicates the time error before the end of the music scale, the vocal is earlier than it ends; if it is the beginning of the new music scale, reset the end beat time and Recording the scale start time, and then determining whether the human sound level matches the music main string scale, and if so, determining whether the previous human sound level matches the scale, until no, and then setting the start beat time to end, This lead time represents the time error that the vocal sound started earlier than it started before the musical scale began.

接著，由起唱拍點延遲時間、起唱拍點超前時間、結束拍點延遲時間及結束拍點超前時間，計算音階節奏感分數SOB(Score of Beat)，算法如下：令起唱拍點時間誤差為TDS，則，起唱拍點分數(SOBS)：SOBS=As+100．(1-TDS/Ls) Then, the vocal beat delay time, the slap beat lead time, the end beat delay time, and the end beat lead time are calculated, and the scale beat score SOB (Score of Beat) is calculated. The algorithm is as follows: The error is TDS, then the vocal beat score (SOBS): SOBS = As +100. (1- TDS/Ls )

其中，TDS=起唱拍點延遲時間(NoteOnLag)+起唱拍點超前時間(NoteOnLead)，As與Ls是預設的經驗值參數。令結束拍點時間誤差為TDE，則：結束拍點分數(SOBE)： SOBE=Ae+100．(1-TDE/Le) Among them, TDS = singer beat delay time (NoteOnLag) + singer beat point lead time (NoteOnLead), As and Ls are preset experience value parameters. Let the end time error be TDE, then: end score (SOBE): SOBE = Ae +100. (1- TDE/Le )

其中，TDE=結束拍點延遲時間(NoteOffLag)+結束拍點超前時間(NoteOffLead)，Ae與Le是預設的經驗值參數，該音階節奏感分數(SOB)：SOB=SOBS．R+SOBE．(1-R) Among them, TDE=End beat delay time (NoteOffLag)+End beat time (NoteOffLead), Ae and Le are preset experience value parameters, the scale rhythm score (SOB): SOB = SOBS . R + SOBE . (1- R )

其中，R為一預設的加權參數，且0<=R<=1。 Where R is a preset weighting parameter and 0<=R<=1.

(3) Emotional scores:

情感是一種比較難以客觀衡量的參數，在此我們利用計算人聲的平均振幅與音樂主旋律的平均振幅之匹配程度來決定。計算情感分數係先取得並計算人聲振幅曲線與音樂振幅曲線的整首歌及每句的匹配程度及每一句的振幅曲線相對整體振幅的變化程度，以取得一平均情感分數。人聲的平均振幅是藉由計算每一個人聲聲音區段的RMS(Root of Mean Square，以下簡稱RMS)值得到，音樂主旋律的平均振幅亦可藉由計算每一個主旋律聲音區段的RMS值或直接由合成之音樂資訊中的振幅參數取得，所述RMS的算法如下： Emotion is a parameter that is difficult to measure objectively. Here we use the degree of matching between the average amplitude of the calculated human voice and the average amplitude of the main music melody. Calculating the emotional score firstly obtains and calculates the degree of matching of the whole song and each sentence of the vocal amplitude curve and the music amplitude curve and the degree of change of the amplitude curve of each sentence relative to the overall amplitude to obtain an average emotional score. The average amplitude of the vocal is obtained by calculating the RMS (Root of Mean Square, hereinafter referred to as RMS) value of each individual voice segment. The average amplitude of the music main melody can also be calculated by calculating the RMS value of each main melody voice segment or directly. Obtained from the amplitude parameter in the synthesized music information, the algorithm of the RMS is as follows:

其中，x(i),i=0,1,…,K-1,K代表此一聲音區段之聲音樣本點數(Samples)，此RMS值，在實際運算上，還可用其他方法如平均振幅或最大振幅等方法取代。如第8、9圖所示，係藉由情感分數估算器每隔一段時間(約0.1 sec)分別計算一次人聲信號與音樂主旋律的RMS值，可得到人聲與音樂的RMS序列，假設分別為MicVol(n)及MelVol(n)，n=0、1、N-1、、、，表示第n個時段，所得到的RMS值，其中N為歌曲時間總長度，並將MicVol(n)的能量準位調成與MelVol(n)相同，然後將其依每個音階的長度做平均，可得人聲與音樂的第m個音階的平均RMS序列分別為AvgMelVol(m)、AvgMicVol(m)；其中須釋明的是：前述MicVol代表麥克風人聲音量(下亦同)，MelVol代表音樂的主旋律音量(下亦同),AvgMelVol(m)代表音樂在第m個音階的平均值，AvgMicVol(m)則代表麥克風人聲音量在第m個音階的平均值；又音階表示音符的音高，音階的長度則是指音符的音長；假設第n個時間段的人聲振幅及音樂主旋律振幅序列分別為MicVol(n)及MelVol(n)，n=0、1、N-1，其中N為歌曲時間總長度將其依每個音階的長度做平均，可得人聲與音樂的第m個音階的平均振幅序列分別為AvgMelVol(m)、AvgMicVol(m)；如下： Where x(i), i=0,1,...,K-1,K represents the sound sample points (Samples) of this sound segment, and the RMS value can be used in other methods such as averaging. Replace with methods such as amplitude or maximum amplitude. As shown in Figures 8 and 9, the RMS value of the vocal signal and the music main melody is calculated by the emotion score estimator at intervals (about 0.1 sec), and the RMS sequence of the vocal and music can be obtained, assuming MicVol respectively. (n) and MelVol(n), n=0, 1, N-1, ,, represent the RMS value of the nth time period, where N is the total length of the song time and the energy of MicVol(n) The level is adjusted to be the same as MelVol(n), and then averaged according to the length of each scale, and the average RMS sequence of the mth scale of vocals and music is AvgMelVol(m) and AvgMicVol(m), respectively; It should be explained that the above MicVol represents the microphone voice volume (the same applies below), MelVol represents the music's main melody volume (the same below), and AvgMelVol (m) represents the average value of the music in the mth scale, AvgMicVol(m) It represents the average value of the microphone human voice in the mth scale; the scale represents the pitch of the note, and the length of the scale refers to the length of the note; assuming that the vocal amplitude and the music main melody amplitude sequence of the nth time period are respectively MicVol(n) and MelVol(n), n=0, 1, N-1, where N is the total length of the song time It is averaged according to the length of each scale, and the average amplitude sequence of the mth scale of vocals and music is AvgMelVol(m) and AvgMicVol(m), respectively;

其中L_m為第m個音符之音長，n_m為第m個音符開始的時間段。 Where L _m is the length of the mth note and n _m is the time period from the beginning of the mth note.

由AvgMelVol(m)，AvgMicVol(m)可用來計算情感分數SOE(Score of Emotion)，首先取得並計算人聲振幅曲線與音樂振幅曲線的整體匹配程度SOET，它可代表整體的情感變化分數，如下： AvgMelVol(m), AvgMicVol(m) can be used to calculate the score of SOE (Score of Emotion). Firstly, the overall matching degree SOET of the vocal amplitude curve and the music amplitude curve is obtained and calculated, which can represent the overall emotional change score, as follows:

其中M為音階總個數，且 Where M is the total number of scales, and

故SOET<=100。 Therefore SOET<=100.

接著，可進行每一句情感分數SOES的計算，首先係將AvgMicVol(m)，AvgMelVol(m)切成一句一句，假設每句歌詞的起始音階為第S(j),j=0,1,2,…,L-1個開始，其中L為歌詞總句數，且令S(L)=M，則每一句的情感變化分數為： Then, the calculation of each emotional score SOES can be performed. First, AvgMicVol(m) and AvgMelVol(m) are cut into one sentence, assuming that the starting scale of each lyric is S(j), j=0, 1, 2,..., L-1 starts, where L is the total number of lyrics, and let S(L)=M, then the emotional change score of each sentence is:

j=0,1,2,…,L-1，然後計算每一句的相對情感變化分數，此分數為每句音量相對於整體音量的變化：首先，令 j=0,1,2,...,L-1, then calculate the relative emotional change score of each sentence, which is the change of the volume of each sentence relative to the overall volume: first, order

則 then

由上述可得，平均情感分數為整體的情感變化分數、每一句的情感變化分數及每一句的相對情感變化分數的加權平均： From the above, the average sentiment score is the weighted average of the overall emotional change score, the emotional change score of each sentence, and the relative emotional change score of each sentence:

其中α、β、γ為加權係數，且α+β+γ=1。 Where α, β, and γ are weighting coefficients, and α+β+γ=1.

(4) Weighted total score: (Please refer to Figure 9)

由上述SOP、SOB、SOE可得加權總分AES(Average Evaluated Score)如下：AES=p．SOP+q．SOB+r．SOE The AES (Average Evaluated Score) obtained by the above SOP, SOB, and SOE is as follows: AES = p . SOP + q . SOB + r . SOE

其中p、q、r為加權係數，且p+q+r=1。 Where p, q, r are weighting coefficients, and p+q+r=1.

Example of implementation:

以一首歌曲為例，我們每0.1秒計算一次人聲的音高MicPitch(n)及RMS平均值MicVol(n)，同時擷取音樂主旋律音符的音高MelNote(n)及計算其RMS平均值MelVol(n)，n=0,1,2,…,N，N表示歌曲總長度，在此不失一般性，為方便說明，在此取N=280，表示歌曲時間總長度為28秒，如第10圖所示，為MicPitch(n)與MelNote(n)之曲線圖，圖中實線代表主旋律音符的音高，縱軸為音樂及人聲音階，每一個整數間隔為一個半音，60表示中音Do，61表示中音升Do，69表示中音La，依此類推，圓點表示由人聲所計算出之音高，並將之轉為音階代號，此音高已經經過正負12個半音的調整，使得人聲音高最接近主旋律音符的音高，圖中實線為一段一段，每一段表示一段持續的音階，每段的高低起伏，表示音階的高低變化，在主旋律音階為-1時，表示該音符為休止符或空的音階，將跳過忽略，圖中圓點為零時，表示該人聲未被計算出音高，該點人聲可能為無聲氣音、靜音或雜音等，將被視為未發出聲音。 Taking a song as an example, we calculate the vocal pitch MicPitch(n) and the RMS mean MicVol(n) every 0.1 seconds, while taking the pitch of the music main melody note MelNote(n) and calculating its RMS average MelVol (n), n=0,1,2,...,N,N represents the total length of the song, and there is no loss of generality. For convenience of explanation, N=280 is used here, indicating that the total length of the song is 28 seconds, such as Figure 10 is a graph of MicPitch(n) and MelNote(n). The solid line in the figure represents the pitch of the main melody note, and the vertical axis represents the music and human sound level. Each integer interval is a semitone, 60 means The midrange Do, 61 means the midrange rises Do, 69 means the midrange La, and so on, the dot represents the pitch calculated by the human voice, and turns it into a scale code, which has passed the positive and negative 12 semitones. Adjustment, so that the human voice is closest to the pitch of the main melody note, The line is a segment, each segment represents a continuous scale, the height of each segment is undulating, indicating the change of the scale. When the main melody scale is -1, indicating that the note is a rest or an empty scale, the ignore will be skipped. When the dot is zero, it means that the vocal is not calculated, and the vocal may be silent, mute or murmur, etc., and will be regarded as unvoiced.

首先由上述之音感分數的算法，可得到第m個音階的高音感匹配值NoteHit(m)(如第11圖中圓形所示)與低音感匹配值NoteHitAround(m)(如第11圖中三角形所示)，m=0,1,2,…M，M=3，如第11圖所示，令PSH=50,K1=100，及PSL=35，K2=50，可得到每個音階m的音感分數(如第11圖中矩形所示)，經過音符長度(如第11圖中星形所示)的加權平均計算後可得平均音感分數ScoreOfPitch(SOP)=98。 First, by the algorithm of the above-mentioned pitch score, the high-pitched sound matching value NoteHit(m) of the mth scale (as shown by the circle in Fig. 11) and the bass sense matching value NoteHitAround(m) can be obtained (as in Fig. 11). As shown by the triangle), m=0,1,2,...M, M=3, as shown in Fig. 11, let PSH=50, K1=100, and PSL=35, K2=50, each scale can be obtained. The pitch score of m (as indicated by the rectangle in Fig. 11) is calculated by the weighted average of the note length (as indicated by the star in Fig. 11) to obtain an average pitch score ScoreOfPitch (SOP) = 98.

接著由上述之節奏感分數的算法，如第12圖所示可得到第m個音階的NoteOnLag(m)(圓形)、NoteOnLead(m)(星形)，令As=10，Ls=10，可算出BeatOnScore(m)(矩形)，如第13圖所示即可得到NoteOffLag(m)(圓形)與NoteOffLead(m)(星形)，令Ae=50,Le=NoteLength(音階長度)，可算出BeatOffScore(m)(圓形)，經過音階長度的加權平均計算後可得ScoreOfBeatStart(SOBS)=93.19，ScoreOfBeatEnd(SOBE)=99.82，令R=0.5，SOB=96.5。 Then, by the above algorithm of the rhythm score, as shown in Fig. 12, NoteOnLag(m) (circle) and NoteOnLead(m) (star) of the mth scale can be obtained, so that As=10, Ls=10, You can calculate BeatOnScore(m) (rectangle), as shown in Figure 13, you can get NoteOffLag(m) (circle) and NoteOffLead(m) (star), let Ae=50, Le=NoteLength (scale length), BeatOffScore(m) (circle) can be calculated. After weighted average calculation of scale length, ScoreOfBeatStart(SOBS)=93.19, ScoreOfBeatEnd(SOBE)=99.82, let R=0.5, SOB=96.5.

再接著由上述之情感分數的算法，首先可得到人聲與音樂主旋律的RMS序列MelVol(n)(如第14圖之L1所示)、MicVol(n)(如第14圖之L2所示)，並將MicVol(n)的能量準位調成與MelVol(n)相同，如第14圖所示，將其依每個音階的長度平均，可得第m個音階的平均RMS序列AvgMelVol(m)(如第15圖L3所示)、AvgMicVol(m)(如第15圖L4所示)，如第15圖所示，設定加權係數，並由此可算出SOET=98.33，第j句的SOES(j)(如第16圖L5所示)及SOEA(j)(如第16圖L6所示)，j=0,1,2,…L-1，總句數L=6，如第16圖所示，平均之SOES=97.2，及SOEA=95.67，經過加權計算後可得：ScoreOfEmotion(SOE)=97.24 Then, by the algorithm of the above emotion score, the RMS sequence MelVol(n) of the vocal and music main melody (as shown by L1 in Fig. 14) and MicVol(n) (as shown by L2 in Fig. 14) can be obtained first. And adjust the energy level of MicVol(n) to be the same as MelVol(n). As shown in Figure 14, it is based on each scale. The average length of the mth scale can be obtained by the average RMS sequence of the mth scale, AvgMelVol(m) (as shown in Fig. 15L3), and AvgMicVol(m) (as shown in Fig. 15L4), as shown in Fig. 15, setting Weighting coefficient, and thus SOET=98.33, SOES(j) of the jth sentence (as shown in Fig. 16L5) and SOEA(j) (as shown in Fig. 16L6), j=0,1, 2, ... L-1, the total number of sentences L = 6, as shown in Figure 16, the average SOES = 97.2, and SOEA = 95.67, after weighting calculations: ScoreOfEmotion (SOE) = 97.24

最後設定加權係數p=0.6，q=0.2，r=0.2，可得到加權總分：AES=p．SOP+q．SOB+r．SOE=97.55 Finally, the weighting coefficient p=0.6, q=0.2, and r=0.2 are set, and the weighted total score can be obtained: AES = p . SOP + q . SOB + r . SOE =97.55

Advantages of the invention:

本發明所述卡拉OK歌曲伴唱自動評分方法主要藉由比對唱歌者音高、拍點位置及音量與歌曲主旋律的音高、拍點位置及音量，分別得到音感分數、節奏感分數及情感分數，再以加權計分方式核算加權總分之創新獨特設計，使本發明對照先前技術而言，將可精確計算出演唱者在每一個歌曲段落的音高、拍點位置及音量誤差，並可利用音高曲線、音量曲線的顯示效果，讓演唱者可以很容易知道哪個地方唱得不夠準確以及哪個地方需要加強，達到同時具教學及娛樂雙重效果之實用進步性者。 The karaoke song accompaniment automatic scoring method of the present invention mainly obtains the pitch score, the rhythm score and the emotion score by comparing the pitch of the singer, the position and volume of the beat, the pitch of the main melody of the song, the position of the beat and the volume. The innovative unique design of the weighted total score is calculated by weighted scoring method, so that the present invention can accurately calculate the pitch, the beat position and the volume error of the singer in each song passage according to the prior art, and can utilize The display of pitch curve and volume curve makes it easy for the singer to know where the sing is not accurate enough and where it needs to be strengthened, to achieve practical progress with both teaching and entertainment effects.

上述實施例所揭示者係藉以具體說明本發明，且文中雖透過特定的術語進行說明，當不能以此限定本發明之專利範圍；熟悉此項技術領域之人士當可在瞭解本發明之精神與原則後對其進行變更與修改而達到等效之目的，而此等變更與修改，皆應涵蓋於如后所述之申請專利範圍所界定範疇中。 The present invention has been specifically described by the above embodiments, and although the description is by specific terms, the invention cannot be limited thereto. The scope of the invention is subject to change and modification of the spirit and principles of the present invention, and such changes and modifications are intended to be included as described below. In the scope defined by the scope of application for patents.

第1圖：本發明之音感分數取得方法文字方塊圖一。 Fig. 1 is a block diagram 1 of a method for obtaining a pitch score of the present invention.

第2圖：本發明之音感分數取得方法文字方塊圖二。 Fig. 2 is a block diagram 2 of the method for obtaining the pitch of the sound of the present invention.

第3圖：本發明之音感分數取得方法文字方塊圖三。 Fig. 3 is a block diagram 3 of the method for obtaining the pitch score of the present invention.

第4圖：本發明之節奏感分數取得方法文字方塊圖一。 Fig. 4 is a block diagram 1 of the method for obtaining the rhythm score of the present invention.

第5圖：本發明之節奏感分數取得方法文字方塊圖二。 Fig. 5 is a block diagram 2 of the method for obtaining the rhythm score of the present invention.

第6圖：本發明之節奏感分數取得方法文字方塊圖三。 Fig. 6 is a block diagram 3 of the method for obtaining the rhythm score of the present invention.

第7圖：本發明之節奏感分數取得方法文字方塊圖四。 Fig. 7 is a block diagram 4 of the method for obtaining the rhythm score of the present invention.

第8圖：本發明之情感分數取得方法文字方塊圖。 Figure 8 is a block diagram of the method for obtaining the sentiment score of the present invention.

第9圖：本發明之自動評分估算方法文字方塊圖。 Figure 9 is a block diagram of the automatic score estimation method of the present invention.

第10圖：本發明之實作範例說明參考圖表一。 Figure 10: An example of the implementation of the present invention is illustrated in Figure 1.

第11圖：本發明之實作範例說明參考圖表二。 Figure 11: An example of the implementation of the present invention is illustrated in Figure 2.

第12圖：本發明之實作範例說明參考圖表三。 Figure 12: An example of the implementation of the present invention is illustrated in Figure 3.

第13圖：本發明之實作範例說明參考圖表四。 Figure 13: An example of the implementation of the present invention is illustrated in Figure 4.

第14圖：本發明之實作範例說明參考圖表五。 Figure 14: A practical example of the invention is illustrated in Figure 5.

第15圖：本發明之實作範例說明參考圖表六。 Figure 15: An example of the implementation of the present invention is illustrated in Figure 6.

第16圖：本發明之實作範例說明參考圖表七。 Figure 16: An example of the implementation of the present invention is illustrated in Figure 7.

Claims

A method for automatically scoring a karaoke song singer, mainly by comparing the pitch of the singer, the position and volume of the beat with the pitch of the main melody of the music, the position of the beat and the volume, to obtain the score of the sensation and the score of the rhythm, respectively. The scoring items of the emotional scores are finally calculated by weighted scoring method to calculate the weighted total scores of the scoring items to obtain the scores of the automatic scoring; wherein the pitch scores are obtained by the singer every other short period of time The sing microphone audio estimates the pitch of a singer. The estimation of the pitch is obtained by the fundamental frequency of the human voice, and then the fundamental frequency is first converted into a relative scale by a sound estimator, and then Comparing the degree of matching between the person's sound level and the scale captured in the main theme of the music, and giving the scale a pitch score, thus calculating the pitch score of all scales, until the end of the concert, an average pitch score can be output; The score is obtained by including the initial parameter setting, setting the number of scales that are initialized, m=0, and the matching value of the vocal and the treble of the scale. eHit=0, and the bass sense match value NoteHitAround=0, NoteHit indicates the number of time periods when the human voice is highly matched during the scale playing, and NoteHitAround indicates the number of time periods when the human voice is high and the scale is within one semitone, and then Obtaining the main melody scale for the next period of time and calculating the human voice for a period of time; and calculating the pitch score algorithm of the scale, which is determined by the high-pitched sound matching value NoteHit, the low-pitched sound matching value NoteHitAround, and the scale length NoteLength; The rhythm sense score is determined by calculating the vocal beat point and the start time of the music main melody scale and the matching degree of the vocal end beat point and the end time of the music main melody scale; the person is first introduced by the rhythm estimator The sound is converted to a relative scale and then compared to the time scale of the scale obtained from the scale and the main melody. The error of this time includes the early or delayed start and end beats, and the time of each scale is recorded. Error, then give the rhythm sense score of the scale, thus calculate the rhythm score of all scales until the end of the concert, and then output the average rhythm score; use the rhythm sense delay matcher and the rhythm sense lead matcher, after conversion The human sound level, the current, the previous and the next music main string scale, respectively calculate the degree of matching between the human voice and the scale in time delay or advance, and obtain the delay time and the lead time of the vocal end beat or the vocal beat. Then, by calculating the scale rhythm score, the rhythm score of the scale is obtained, and accordingly, starting from the first scale The rhythm of each scale is errored until the last scale ends, and then the average rhythm score is calculated; wherein the sentiment score is determined by the degree to which the average amplitude of the calculated human voice matches the average amplitude of the musical main melody; The average amplitude of the vocal is obtained by calculating the RMS (Root of Mean Square) value of each individual voice section, and the average amplitude of the music main melody is calculated by calculating the RMS value of each main melody sound section or directly by the synthesized music. The amplitude parameter in the information is obtained; the RMS value of the vocal signal and the music main melody is calculated by the emotion score estimator at intervals, to obtain the RMS sequence of the vocal and music, and then averaged according to the length of each scale. , the sound of vocals and music Order average RMS sequence; then calculate the emotional score, first obtain and calculate the degree of matching between the whole song and each sentence of the vocal amplitude curve and the music amplitude curve and the degree of change of the amplitude curve of each sentence relative to the overall amplitude to obtain an average emotion fraction.

According to the automatic karaoke song accompaniment scoring method described in claim 1, wherein if the high-pitched sound matching value NoteHit is greater than zero, the m-th high-pitched scale matching score is calculated (abbreviated as PitchScore(m)): PitchScore(m) =PSH+K1 * NoteHit(m)/NoteLength(m) where: PSH, K1 are adjustable empirical value parameters, otherwise calculate the bass level scale matching score: PitchScore(m)=PSL+K2 * NoteHitAround(m)/NoteLength (m) where: PSL, K2 are adjustable empirical value parameters, and limit: 0 <= PitchScore (m) <= 100 to determine whether it is the last scale, otherwise repeat the above process, if yes then "calculate the average pitch score The algorithm is a weighted average of all PitchScore(m) weighted proportions of NoteLength(m) as follows: Let the total length of the scale NL = Σ _{m = 0~M-1} NoteLength(m), the average pitch score SOP ( Score of Pitch): The algorithm for calculating the score of the score of SOB (Score of Beat) is as follows: the vocal beat time error is TDS, and the sing beat score (SOBS): SOBS = As +100. (1- TDS/Ls ) where TDS = singer beat delay time (NoteOnLag) + singer beat lead time (NoteOnLead), As and Ls are preset experience value parameters, so that the end beat time error is TDE , then: end beat score (SOBE): SOBE = Ae +100. (1- TDE/Le ) where TDE = end beat delay time (NoteOffLag) + end beat lead time (NoteOffLead), Ae and Le are preset experience value parameters, the scale rhythm sense score (SOB): SOB = SOBS . R + SOBE . (1- R ) where R is a preset weighting parameter, and 0<=R<=1 assumes that the vocal amplitude and the music main melody amplitude sequence of the nth time period are MicVol(n) and MelVol(n), respectively. n=0, 1, N-1, where N is the total length of the song, which is averaged according to the length of each scale, and the average amplitude sequence of the mth scale of the vocals and music is AvgMelVol(m), AvgMicVol, respectively. (m); as follows: Where L _m is the length of the mth note, n _m is the time period from the beginning of the mth note; AvgMelVol(m), AvgMicVol(m) is used to calculate the score of SOE (Score of Emotion), first obtained and calculated The overall matching degree of the vocal amplitude curve and the music amplitude curve, SOET, which represents the overall emotional change score, as follows: Where M is the total number of scales and: Therefore, SOET<=100 is followed by the calculation of each sentiment score SOES. First, the AvgMicVol(m), AvgMelVol(m) is cut into one sentence, assuming that the starting scale of each lyric is S(j), j =0,1,2,..., L-1 start, where L is the total number of lyrics, and let S(L)=M, then the emotional change score of each sentence is: j=0,1,2,...,L-1, then calculate the relative emotional change score of each sentence, which is the change of the volume of each sentence relative to the overall volume: first, let: then The average sentiment score obtained from the above is the weighted average of the overall emotional change score, the emotional change score of each sentence, and the relative emotional change score of each sentence: Where α, β, γ are weighting coefficients, and α+β+γ=1; the AES (Average Evaluated Score) obtained by the SOP, SOB, and SOE is as follows: AES = p . SOP + q . SOB + r . SOE where p, q, r are weighting coefficients, and p+q+r=1.