TW201025289A

TW201025289A - Singing system with situation sound effect and method thereof

Info

Publication number: TW201025289A
Application number: TW97150672A
Authority: TW
Inventors: Jim-Wang Chen; Po-Ling Chang; Hilda Wang
Original assignee: Inventec Besta Co Ltd
Priority date: 2008-12-25
Filing date: 2008-12-25
Publication date: 2010-07-01
Also published as: TWI377559B

Abstract

A singing system with situation sound effect and method thereof is disclosed. By analyzing a song voice and a video of a user and a sound tune of a playing song to calculate a threshold, and comparing the threshold with threshold ranges corresponding to situation sounds effects respectively, so as to loading the suitable sound effect of the playing song according to the comparing result. The mechanism is to increase the recreational effect.

Description

201025289 九、發明說明：【發明所屬之技術領域】、本發_有_-種歌似其方法，_是指- 種能夠分析歌唱語音、視訊影像及歌曲曲調來載入合適的音效語音之具情境音效的歌唱系統及其方法。【先前技術】近年來，隨著半導體工業的蓬勃發展，以及國人休閒 φ 娛樂的意識逐步提升，許多離需外出才能夠進行的娱樂項目，如：卡啦οκ，已經能夠在家中靠伴唱機來達成。然而’隨著時_演進’使用者已經不能滿足僅單純提供歌曲，音的伴唱機，因此，如何為伴唱機增加功能已經成為各家廠商亟欲解決的問題。般而5，卡拉ΟΚ之所以具有娛樂性，除了能夠藉由唱歌宣⑽緒外，能夠和其他聽眾進行互動亦非常重要」而獨自-人在豕唱歌時，並不會有其他聽本給予 • 合適的反饋，如：喝采、鼓掌····..等，因此，將產生娛樂性不足的問題。有鑑於此’便有廠商提出透過網路的方式來實現網路歌唱，讀與其侧時麵路巾的聽眾進行互 ' 動’然而，並非每個地方都能夠輕易上'網，也並非在任何 - 雜都有财願意聽自己唱歌，因此，上義路歌唱的方式仍然不足以解決娛樂性不足的問題。综上所述，可知先前技術中長_來—直存在歌唱娱樂性不足之問題，因此實有必要提出改進的技術手段，來 5 201025289 解決此一問題。【發明内容】有霎於先別技術存在的問題，本發明遂揭露一種具情境音效的歌唱系統及其方法。一本發明所揭露之具情境音效的歌唱系統包含：歌曲貧料庫、歌唱模組、語音分析模組、影像辨識模組、曲調 /刀析模組、處理模組及音效模組。其中，歌曲資料庫用以201025289 IX, invention description: [Technical field of invention], this hair _ has _- kinds of songs like its method, _ refers to a kind of ability to analyze singing voice, video images and song tunes to load appropriate sound effects Singing system and method of contextual sound effects. [Prior Art] In recent years, with the rapid development of the semiconductor industry and the gradual improvement of the leisure and entertainment of the Chinese people, many entertainment projects that can be carried out away from home, such as: Karabak, have been able to rely on the phonograph at home. Achieved. However, the 'follow-evolution' user can no longer satisfy the phonograph that only provides songs and sounds. Therefore, how to add functions to the phonograph has become a problem that various manufacturers are eager to solve. In general, the reason why Karaoke is entertaining is that it is very important to be able to interact with other audiences by singing (10). "Alone--when people sing, there will be no other listeners." Appropriate feedback, such as: applause, applause, etc., will cause problems of lack of entertainment. In view of this, there are manufacturers who propose to use the Internet to sing songs on the Internet, and listen to the listeners who are on the side of the road to interact with each other. However, not every place can easily go online, nor in any - Miscellaneous are willing to listen to their own singing, therefore, the way Shangyi Road sings is still not enough to solve the problem of lack of entertainment. In summary, it can be seen that the prior art has a problem of insufficient entertainment and entertainment, so it is necessary to propose improved technical means to solve this problem. SUMMARY OF THE INVENTION The present invention discloses a singing system having a situational sound effect and a method thereof, contrary to the problems existing in the prior art. A singing system with contextual effects disclosed by the invention includes: a song library, a singing module, a voice analysis module, an image recognition module, a tune/knife module, a processing module, and a sound effect module. Among them, the song database is used

儲存歌曲語音及音赌音，其巾各音效語音分珊應一個 Π檻區間，歌曰模組用以接收選擇條件，並根據選擇條件播放歌H音之―；語音分析模朗以接收歌唱語音，並根據語音分_唱語音懸生情齡數；鱗°辨識模組用以擷取視訊影像，並根據臉部演算辨識視訊影像後產生表情參數；曲齡龍_峰_譜演算分析播放中的歌曲語音，並根據分析結果產生曲調參數；處理模組用以根據載场件獻倾參數、表·數及曲調參數至少其中之-以計算門檀值；音效模組用以將門檀值與門植區間進行比對’絲據輯結果载續應服區間之音效語音進行播放。至於本發明之具情境音效的歌唱方法，其步驟包括：提供歌祕音及音效語音，其巾各纽語音分珊應一個區間：接收選擇條件，並根據選擇條件播放歌曲語音之接錄唱語音’並娜好演算分析歌唱語音後產生情緒參數’取視訊影像，並根據臉部演算辨識該視訊 6 影像後產生表情參數；根據頻譜演算分析播放中的歌曲語音’並根據分析結果產生曲調參數；根據载入條件載入情緒參數'表情參數及曲調參數至少其中之一以計算門伊值；將門檻值與門檻區間進行比對’並根據比對結果载Z 對應門檻區間之音效語音進行播放。本發明所揭露之紐與方法如上，與先前技術之間的差異在於本發明透過分析歌唱語音、視訊影像及歌曲曲調來計算門檻值’並且將此門檻值與音效語音所對應的門檻區間進行比對，以便4艮據比對結果載入並播放合適的音2 語音。透過上述的技術手段，本發明可以達到提高歌唱娛樂性之技術功效。【實施方式】以下將配合圖式及實施例來詳細說明本發明之實施方式，藉崎本發明如何制技射段來解決技術問題並達成技術功效的實現過程能充分理解並據以實施。在說月本發明所揭露的具情境音效的歌唱系統及其方法之刖’先對本發明的應用魏進行鋼。本發明可應用於連接有收音裝置、攝f彡裝置及揚縣置的電子設備中/、中》亥電子6又備儲存有歌曲語音及音效語音，並且具有數位訊鱗理單柳咖Signal触， DSU)來處理收曰裝置及攝f彡裝置卿得的聲音及視訊影像，在實際實施上此數位簡；處理單元可透過傾及硬體至少其中之 201025289 一來實現，在本發明中’數位訊號處理單元是由語音分析模組、影像辨識模組、曲調分析模組及處理模組所組成，其處理流程將在稍’後配合圖式作詳細說明。接下來，在說明本發明的實施例之前，先配合圖式對本發明具情境音效的歌唱系統及其方法作進一步的說明，清參閱「第1圖」，「第！圖」為本發明具情境音效的歌曰系統之方塊圖，包含：歌曲資料庫1〇1、歌唱模組 102、浯音分析模組1〇3、影像辨識模組川4、曲調分析模組105、處理模組1〇6及音效模組，且更具有收音裝置110及攝影裝置1U等硬體設備，所述收音裝置11〇為在歌曰過私中用以即時接收聲音(例如··使用者的歌唱語曰）的硬體叹備’如：麥克風，而攝影裝置⑴則為在歌唱過程中用以即時娜視訊影像的硬體設備，如：攝影機。承上所述’其中歌曲資_ 1〇1用以儲存歌曲語音及音效語音’其巾各音效語音分卿應-烟魏間，所述歌，語音及音效語音皆為檔案形式(例如：副檔名為 Ζ,Σ······等)姐的多媒體檔案’其歌曲語音用以提供使用者進行伴唱，衫絲相秘科反饋聲音，如.喝采聲音、鼓掌聲音久立…。特別要說明的是，職、一個門植區間，舉例來說，假設音效二為a‘mp3, ’且其所對應的門禮區間為“W00”，表來 201025289 以門檻區間作為音效語音的檔案名稱，如：“l~100.mp3 用以實現音效語音及其服闕的職祕。上述對應方式僅為說明之用’本發明並未以上述方式限定音效語音及其門檻區間的對應方式。歌唱模組102用以接收選擇條件，並根據選擇條件播放歌曲語音之- ’所述選擇條件可透過按壓功能鍵或點選的方式進行輸入，舉例來說，假設歌曲資料庫1〇1中的歌 • 絲音均各自對應一健值作為曲號，當透過鍵盤輸入歌曲語音的編號(即曲號)後，歌唱模組1〇2將根據所輸入的曲號作為選擇條件’用以自歌曲資料庫1〇1載入並播放所選擇的歌曲語音，以便提供使用者進行伴唱。語音分析模組103用以接收歌唱語音，並根據語音演算分析歌唱語音後產生情緒參數，而且該語音分析模組 103更可包含於接收歌唱語音後’根據過遽參數過遽歌唱語音的背景聲音，所述歌唱語音是透過收音裝置11〇(例 ❹ 如：麥克風)進行接收，而語音演算則是對所接收的歌唱語音進行分析的·_，並且在分析後產生相應的情緒參數’舉例來說’假設歌唱語音為高昂贿的聲音，語音 ' 分析模組103將根據語音演算來分析此歌唱語音，並生相應於此歌唱語音的情緒參數，如：數值“8〇” ；若歌唱語音為悲怨悠長的聲音’同樣根據語音演算來分析^歌唱語音以產生相應的情緒參數，如：數 ° ^ ’以此例而言，高昂短促的聲音齡析後所得義,_參數之數 201025289 值’將大於悲怨悠長的聲音，如此—來，便可藉由情緒參數的數值大小來得知歌唱語音卿音_(即高昂短促或悲怨悠長），用以推斷使用者歌唱的情緒(例如：高昂短促代表情緒為「高興」；悲怨悠長代表情緒為「悲怨」）。換而言之’語音分析模組1〇3根據語音演算將歌唱語音量化為情緒參數的參數值，由於語音演算為習知技術，故在此不多作贅述。 • 影像_模組1G4肋縣魏影像，並根據臉部演 =辨識視tfl··生表料數，_視婦像為透過攝〜裝置m進行接收’ *臉部演算則包含人臉偵測、特徵擷取及表情辨識三個階段，且可辨識出的表情至少包含高、傷〜、生氣、驚舒、厭惡及害怕等六種基本表情，由於臉部演算為習知麟’故在此便*多作舰，在實際實包上可將’、種基本表情定義為不同的數值赠為表情參 ^的參數值，舉·說’假設將高興、傷心、生氣、驚舒、及害鮮六種基本表情分別定義為數值“丨”、數值 ^、數值“3”、數值“4”、數值“5”及數值“6”，二树觸模組1G4根據臉部演算將視訊影像觸為「高〃」後，其產生的表情參數之參數值為數， - 此類推。亚曲調分析觀⑽㈣根_譜料分析播放中的 «’並根據分析結果產生曲調參數，其頻譜演算至夕^音1、音高及音色其巾之—的計算，也就是說根據 201025289 ❿ =演算來分_放㈣歌聽音之音量、音高及音色至八中之，用以辨識其歌曲語音的旋律來產生曲調參立j來說’作又设歌曲§吾音的聲音強度(即音量)為低、曰頻率(即音两)為低，以及音色為柔和，其曲調分析模組⑴5根據頻譜演算對此歌曲語音進行分析後，將產生相 f於此歌曲語音的曲調參數，如：數值“10” ；若歌曲語曰的聲日強度為高、聲音鮮為高，以及音色為尖銳，則，同樣產生相應於此歌曲語音的曲調參數，如：數值“90 以此例而§ ’輕柔的歌曲語音其所對應的曲調參數之參數值’將小於慷慨激昂的歌曲語音所對應的曲調參數之 ^數值。上述舉讎為方便槪之用，本發明並未以此限錢調參數的計算方式。除此之外，由於鱗演算在音頻分析的領域巾亦為f知技術，故在此衫作資述。處理模組106用以根據載入條件載入情緒參數、表情參數及曲調參數至少其中之—以計算門檻值’所述載入條件可為預設的參數值，例如：載人條件為數值“r，代表僅載入情緒參L餅為触“2，，錄僅載入表情參數；載人條件絲值“3”代錢人情緒參數及表情參數；載入條件為數值“4”代表僅载入曲調參數……並以此類推直讀人條件為數值“7”代錢人情緒參數、表情參數及蝴參數，而且此載人條件可透過仙者以按壓功能鍵或透過游標點選的方式進行設定。另外，門捏值可為數字及文字，舉娜說，假設情緒參數為練“2〇”、 201025289The song voice and sound gambling sounds are stored, and the sound effects of the towel are divided into a range, the song module is used to receive the selection condition, and the song H sound is played according to the selection condition; the voice analysis mode is used to receive the singing voice. And according to the voice _ vocal speech suspension age; the scale recognition module is used to capture the video image, and the facial calculus is used to identify the video image to generate expression parameters; Qu dynasty _ peak _ spectrum calculus analysis play The song voice, and according to the analysis result, the tune parameter is generated; the processing module is configured to calculate at least one of the tilt parameter, the table number and the tune parameter according to the load field member; the sound effect module is used to calculate the door value and the door value The gantry interval is compared with the sound effect voice of the sequel to the results of the silk collection. As for the singing method with the context sound effect of the present invention, the steps include: providing a secret voice and a sound effect voice, and the voice of each towel should be separated into an interval: receiving the selection condition, and playing the voice of the song according to the selection condition. 'And Na Na calculus analyzes the vocal speech and generates the emotional parameter' to take the video image, and recognizes the video 6 image according to the facial calculus to generate the expression parameter; analyzes the song voice in the playing according to the spectrum calculus and generates the tune parameter according to the analysis result; According to the loading condition, at least one of the emotion parameter 'expression parameter and the tune parameter is loaded to calculate the door value; the threshold value is compared with the threshold interval' and the sound effect speech of the Z corresponding interval is played according to the comparison result. The difference between the method and the method disclosed in the present invention is that the difference between the prior art and the prior art is that the present invention calculates the threshold value by analyzing the singing voice, the video image and the song tune, and compares the threshold value with the threshold interval corresponding to the sound effect speech. Right, so that the appropriate tone 2 speech is loaded and played according to the comparison result. Through the above technical means, the present invention can achieve the technical effect of improving the entertainment of singing. [Embodiment] Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings and the embodiments. The implementation process of how to solve the technical problems and achieve technical effects by the invention of the invention can be fully understood and implemented. In the speech system and method of the contextual sound effect disclosed in the present invention, the application of the present invention is first carried out. The invention can be applied to an electronic device connected with a sound collecting device, a camera device and a Yangxian device. The middle electronic device 6 is also provided with a song voice and a sound effect voice, and has a digital signal processing card. , DSU) to process the sound and video images of the receiving device and the camera device, which is implemented in practice. The processing unit can be realized by tilting and hardware at least 201025289, in the present invention. The digital signal processing unit is composed of a voice analysis module, an image recognition module, a tune analysis module and a processing module, and the processing flow thereof will be described in detail later with the drawing. Next, prior to describing an embodiment of the present invention, the vocal system and method for the contextual sound of the present invention will be further described with reference to the drawings. Referring to "FIG. 1", "Graphic!" is a situation of the present invention. The block diagram of the sound effect song system includes: a song database 1, a singing module 102, a voice analysis module 1〇3, an image recognition module 4, a tune analysis module 105, and a processing module 1 6 and a sound effect module, and further comprising a hardware device such as a sound pickup device 110 and a photographing device 1U, wherein the sound pickup device 11 is used for receiving sounds in an instant voice (for example, a user's singing voice). The hardware sighs such as a microphone, and the photographic device (1) is a hardware device used for instant video recording during the singing process, such as a camera. According to the above, "the song _ 1 〇 1 is used to store the voice of the song and the sound of the voice", the sound of each voice of the towel should be - the smoke, the song, the voice and the sound effect are all in the form of archives (for example: vice The file name is Ζ, Σ······, etc.) The sister's multimedia file's song voice is used to provide users with vocals, and the shirts and secrets feedback sounds, such as drinking sounds and applauding sounds. In particular, the job, a gate interval, for example, assume that the sound effect 2 is a'mp3, 'and the corresponding door margin interval is "W00", the table comes to 201025289 with the threshold interval as the voice effect file. The name, such as: "l~100.mp3 is used to realize the secret of sound effect and its service. The above corresponding mode is only for the purpose of use." The present invention does not limit the corresponding manner of the sound effect speech and its threshold interval in the above manner. The singing module 102 is configured to receive the selection condition and play the song voice according to the selection condition - the selection condition can be input by pressing a function key or clicking, for example, assuming that the song database is 1 The songs and silky sounds each correspond to a health value as a track number. When the number of the song voice (ie, the track number) is input through the keyboard, the singing module 1〇2 will be used as the selection condition according to the input track number. The database 1〇1 loads and plays the selected song voice to provide the user with the accompaniment. The voice analysis module 103 is configured to receive the singing voice and analyze the singing voice according to the voice calculus. The sentimental parameter is further included in the background sound of the singing voice after the singing voice is received, and the singing voice is performed by the radio device 11 (for example, a microphone). Receiving, and the speech calculus is the __ which analyzes the received singing voice, and generates the corresponding emotional parameter after analysis. For example, if the singing voice is a high bribe voice, the voice 'analysis module 103 will be based on The speech calculus analyzes the singing voice and generates the emotional parameters corresponding to the singing voice, such as: the value "8〇"; if the singing voice is a long and sad voice, the same is based on the phonetic calculus to analyze the singing voice to produce the corresponding Emotional parameters, such as: number ° ^ ' In this case, the high and short sounds after the age of the meaning, _ parameter number 201025289 value will be greater than the long and sorrowful sound, so - can, by emotional parameters The numerical value is used to know the singing voice _ _ (that is, short or short sorrow) to infer the emotion of the user singing (for example: high short representative Emotions are "happy"; long grief and resentment means "sorrow and resentment". In other words, the speech analysis module 1 量化 3 quantizes the vocal speech into the parameter values of the emotional parameters based on the speech calculus. Since the speech calculus is a conventional technique, it will not be repeated here. • Image_Module 1G4 rib county Wei image, and according to the face performance = identification view tfl · · raw table number, _ depending on the woman's image through the camera ~ device m to receive ' * face calculus includes face detection Three stages of feature extraction and expression recognition, and the recognizable expressions include at least six basic expressions: high, injury~, angry, shocking, disgusting, and scared. Since the facial calculation is Xizhilin, it is here* For more ships, in the actual package, the basic expressions of ', the basic expressions can be defined as different values for the parameter values of the expressions, and the assumptions will be happy, sad, angry, shocking, and harmful. The basic expressions are defined as the value "丨", the value ^, the value "3", the value "4", the value "5" and the value "6". The two-tree touch module 1G4 touches the video image as "high" according to the face calculation. After 〃, the parameter value of the expression parameter it produces is a number, - such a push. Sub-melody analysis view (10) (four) root _ spectrum analysis playback «' and based on the analysis results to produce tune parameters, its spectrum calculation to Xi ^ sound 1, pitch and timbre its towel - calculation, that is, according to 201025289 ❿ = The calculus is divided into _ (4) the volume, pitch and timbre of the song to the eighth, used to identify the melody of the song's voice to produce the melody of the song, the sound intensity of the song § wu The volume is low, the 曰 frequency (ie, the sound is two) is low, and the timbre is soft. After the melody analysis module (1) 5 analyzes the song voice according to the spectrum calculation, the tune parameter of the song of the song is generated, such as : The value is "10"; if the sound intensity of the song is high, the sound is too high, and the tone is sharp, then the tune parameter corresponding to the voice of the song is also generated, such as: the value "90. 'The value of the parameter of the tune parameter corresponding to the soft song voice will be less than the value of the tune parameter corresponding to the voice of the impassioned song. The above is for convenience, and the invention does not limit the parameters. In addition, since the scale calculation is also a technique in the field of audio analysis, it is used in this shirt. The processing module 106 is used to load emotional parameters, expression parameters and tune according to loading conditions. At least one of the parameters - in order to calculate the threshold value - the loading condition may be a preset parameter value, for example: the carrying condition is the value "r, which means that only the emotional parameter L cake is touched to "2," Into the expression parameters; manned conditional silk value "3" generation of people's emotional parameters and expression parameters; loading conditions for the value "4" means that only the tune parameters are loaded ... and such a straight reading condition is the value "7" The emotional parameters, expression parameters and butterfly parameters of the money-makers, and the manned conditions can be set by pressing the function keys or by cursor selection. In addition, the door pinch value can be numbers and characters, said Na, Assume that the emotional parameters are practiced "2〇", 201025289

fl#參數為數‘值，1及曲調參數為數值“川，，，且載入 ^件為數值“7”（即載人情緒參數、表情參數及曲調參 =’其處理馳1G6絲獻料私這三齡數之參〜值進灯計异(例仏經由，運算或經由傅立葉轉換運舁)後得到卩獅，如：數值‘‘88,，。制要說明的是，則述情緒參數、表情參數及_參數可以制賴隔的方式產生以間隔的產生方式為例，情緒參數、表情參數及曲調參數均可根_設㈣關隔⑽如：五秒)來間隔產生。特別要說明的是，本發明並未限定載入條件及門檻值為數值’也就是說’所計算出的載入條件及門檻值除了為數值之外，亦可以文字，如：“A”、“B”……等作為代表。音效模組107用以將門檻值與門檻區間進行比對，並根據比對結城人對簡檻區狀音效語音進行播放，舉例來說，假設門檻值為數值“88，’，其音效模組1〇7將根據此門檻值與各音效語音所對應的門檻區間進行比對，若經比對後得知門檻值在門檻區間的範圍内，例如：音效語音“a.mp3’’所對應的門檻區間為“丨〜丨⑽，，，而門權值 88位於門植區間1〜1〇〇”内，因此，載入對應門播區間之音效語音“a.mp3”進行播放。如「第2圖」所示，「第2圖」為本發明具情境音效的歌唱方法之流程圖，包含下列步驟：提供歌曲語音及音效語音，其甲各音效語音分別對應一個門檻區間（步驟 12 201025289 201);接收選擇條件’並根據選擇條件播放歌曲語音之一 (步驟2G2);接收歌唱語音’並根據語音演算分析歌唱語音後纽_錄(轉203)，轉郷像，並根據^ 部演算辨賴視訊影像後產生表情參數(步驟2〇4);根據 ' 猶分浦放巾的歌曲語音，錄據分析結果產生曲調參數(步驟205);根據載入條件載入情緒參數、表情參數及曲調參數至少其中之—以計算門檻值(步驟雇）；將 Π檻值與門檻區間進行比對，並根據比對結果載入對應門植區間之音效語音進行播放(步驟2〇7)。透過上述步驟，即可透過为析歌唱語音、視訊影像及歌曲曲調來計算門權值’並且將糾檻值與音效語音麟_門麵間進行比對，以便根據輯絲载人並賊合適的音效語音，用以提高歌唱娛樂性。以下配合「第3圖」至「第5圖」以實施例的方式進行如下說明’如「第3圖」所示意，「第3圖」為應用本 ❹ 發明選擇歌曲語音進储狀示意II。當朗者要進行歌唱時，可先透過歌曲選擇介面300中的曲目顯示區塊302 查询歌曲資料庫101所提供的歌曲語音，並透過曲號輸入 - 區塊301輸入欲歌唱的歌曲語音之曲號後，點選開始元件 303確定進行歌唱’此時’歌唱模組1〇2接收所輸入的曲號作為選擇條件，並且根據此選擇條件播放對應的歌曲語音。另外，使用者亦可透過點選重唱元件304使播放的歌曲語音重新播放，以達到重唱的目的。 13 201025289 承上所述，請參閲「第4圖」，「第4圖」為應用本發明進行歌唱之示意圖。當使用者選定歌曲語音後，將從歌曲選擇介面300切換至歌唱介面携，並且於影音顯示區塊410播放歌曲語音，此時，使用者可以透過收音裝置 no(例如··麥克風)進行歌唱，狀音裝置11G能夠將使用者的聲音傳遞至語音分析模組103，接著，該語音分析模組⑽自收音裝置110接收此聲音作為歌唱語音，並且根 • 據f音演算來對歌唱語音進行分析以產生情緒參數，舉例來說’當使財歌唱的聲音(也就是所歌唱語音)為悲怨悠長時’語音分韻組⑽娜語音演縣分析此歌; «-曰，並且將此歌唱語音進行量化以產生情緒參數，如：數值“20” ’以此例而言，情緒參數之參數值越低，代表使用者的情緒越為悲怨。另外’除了上述收音裝置110接收使用者的歌唱聲音之外，更具有攝影裝置ln拍攝使用者的即時影像並且 _ f此即時影像傳遞至影像辨識模組1G4，接著，影像辨識她104擁取攝影裝置111所拍攝的即時影像作為視訊影像，並且根據臉部演算來辨識視訊雜的表情後產生表情參數’在實際實施上，可預先設置使用者的各種表情特 . ^如：高興、傷心、生氣、驚tf、厭惡及害怕等，用以提间臉部演算辨識表情的精確度。除此之外，視訊影像亦可.4不於視訊顯不區塊411中方便讓使用者得知自己本身的表情’而且此視訊顯示區塊411可透過點選視訊元件 201025289 412進行隱藏或顯示。進扞t斤述，在語音分析模組103及影像辨識模組104 算來^的同時，曲調分析模組1〇5亦根據頻譜演异來刀析播放中的歌曲語音以產生_參數，舉例來說，二=音為抒情歌曲，曲調分析模纽105根據頻譜演專立賴換的方式，用以分·曲語音的旋律(例 ▲ •曰色)來產生曲調參數，如：數值“ 1〇”，以此例而 "’輕柔的歌曲語音其所對應的曲調參數之參數值，將小於慷慨激昂的歌曲語音所對應的曲調參數之參數值。下來’處理模組106根據預設的載入條件，如：數，值7載入上述所產生的情緒參數“2〇”、表情參數“】，曲調參數“10”來計算值，如：數值“88，，其計算的方式可透過四則運算或經由傅立葉轉換來實現。在實際冗施上，使用者亦可透過按壓功能鍵，如：按塵鍵盤，鍵s來開啟載人條件的設定視窗(圖巾未示)，用以設疋載入條件，舉例來說，當遇到使用者在歌唱時習慣面無表情的情況下，可設定載入條件為數值“5，，（即載入情緒參數及曲調參數）’使處理模組106根據所設定的載入條件僅載入情緒參數“2〇”及曲調參數“i〇”來計算門播值’因為在面無表情的情況下’其表情參數的參考價值較低’故不適合作為計算門檻值的參數。當門檻值計算完成後’音效模組107將此門檻值與歌曲資料庫101中的各音效語音之門檻區間進行比對，並且 15 201025289 根據=對結果自歌曲資料庫101載人對應門魏間之音效语音進行播放，舉例來說，假設門植值為數值“88”，右音效語音“a.mP3，，所對應的門檻區間為“1〜100”，則代表門檻值位於其門檻區間中，故比對結果為符合，反 * 之’若門触不在數值“1”至數值“100”的範圍口中，則比對結果為不符合，由於此例的比對結果符合，因此，自歌曲貝料庫101載入所對應的音效語音“a.mp3” ，並且 ⑩ 透過揚聲裝置，如：♦八，進行播放。另外，當使用者不欲繼續歌唱時，亦可透過點選返回元件413停止播放歌曲語音，並且由當前的歌唱介面4〇〇返回歌曲選擇介面3〇〇。除此之外，亦可將前述所判斷的 =緒參數、表情參數及_參數，収字的方式顯示於情境顯不區塊414’舉例來說，假設情緒參數為數值‘‘2〇”、表情參數為數值“1”及蝴參數為“ 10”則分別以文字心心」傷心」及「柔和」來進行顯示。特別要說明的 ® 是’上例三個參數的數值與所代表的文字僅作為說明之用，本發明並未限定這三個參數的表現形式。 ^另外’如「第5圖」所示意，「第5圖」為應用本發 _ ㈣定門楹區間之示意圖。前面提到，各音效語音均對應 * -個門捏區間，在實際實施上，其對應方式可透過對照表來達成’舉例來說’以一鑛照表記錄門捏區間及音效語音的對應關係’而且此門插區間可透過設^介面通進行設定’其設定方式可透過門播區間設定區塊51〇設定門檻 16 201025289 區間的數值範圍，如：數值為“101〜200” ，以及所對應的音效語音之檔案名稱“b.wav” ，並且在設定完成後點選確定元件520儲存設定’亦或是點選取消元件53〇取消所作的設定。特別要說明的是，在輸入音效語音之檔案名 ' 稱“b.wav”時，更可在此檔案名稱前輸入檔案的路徑。綜上所述，可知本發明與先前技術之間的差異在於透過分析歌唱語音、視訊影像及歌曲曲調來計算門播值，並 φ 且將此門檻值與音效語音所對應的門檻區間進行比對，以便根據比對結果載入並播放合適的音效語音，藉由此一技術手段可以在不同的情境下播放合適的音效語音，來解決先刖技術所存在的問題，進而達成提高歌唱娱樂性之技術功效。雖然本發明以前述之實施例揭露如上，其並非用以限定本㈣，任何„树者，林麟本發明之精神和範_ ’當可許之更__，耻本發明 ❿=護顧須視本說明書所附之申請專利範圍所界為準。 f圖式簡單說明】 . 本發情境音效魄料統之方塊圖。 • 恤特_之流程圖。圓。第3 _用本㈣選編語音進行播放之示意第4圖為應用本發明進行歌唱之示意圖。 17 201025289 第5圖為應用本發明設定門檻區間之示意圖。【主要元件符號說明】 101歌曲資料庫 102歌唱模組 103語音分析模組 104影像辨識模組 105曲調分析模組 106處理模組 107音效模組 110收音裝置 111攝影裝置 300歌曲選擇介面 301曲號輸入區塊 302曲目顯示區塊 303開始元件 304重唱元件 400歌唱介面 410影音顯示區塊 411視訊顯不區塊 412視訊元件 413返回元件 414情境顯示區塊 500設定介面 18 51〇門檻區間設定區塊 520確定元件 530取消元件步驟201提供至少-歌曲語音及至少一音效語音，其中各該音效語音分別對應一門檻區間步驟202接收-選擇條件’並根據該選擇條件播放該些歌曲語音之一步驟2〇3接收-歌唱語音，並根據一語音演算分析該歌唱§吾音後產生一情緒參數步驟204擷取-視訊影像，並根據一臉部演算辨識該視訊影像後產生一表情參數步驟205根據一頻譜演算分析播放中的該歌曲語曰，並根據分析結果產生一曲調參數步驟206根據一载入條件載入該情緒參數、該表情參數及該曲調參數至少其中之一以計算一門檻值步驟207將該門檻值與該門檻區間進行比對，並根據比對結果載入對應該門檻區間之該音效語音進行播放The fl# parameter is the number 'value, 1 and the tune parameter is the value "chuan,,, and the loaded piece is the value "7" (that is, the manned emotional parameter, the expression parameter and the tune parameter = 'the processing of the 1G6 silk offer private The parameters of the three-in-one number-values are obtained by the light meter (for example, via operation, or by Fourier transform), and the lion is obtained, for example, the value '88, the system is described, then the emotional parameters, The expression parameter and the _ parameter can be generated in the manner of interval generation as an example. The emotional parameter, the expression parameter and the tune parameter can be generated by the interval (10) (10), such as: five seconds, at intervals. Yes, the present invention does not limit the loading condition and the threshold value as the value 'that is, the calculated loading condition and the threshold value can be in addition to the numerical value, such as: "A", "B"... The sound module 107 is used to compare the threshold value with the threshold interval, and to play the simple sound effect sound according to the comparison of the Yucheng people. For example, suppose the threshold value is "88," , its sound module 1〇7 will be based on this threshold and The threshold interval corresponding to the sound effect is compared. If the threshold value is found to be within the threshold range, for example, the threshold interval corresponding to the sound effect voice “a.mp3′′ is “丨~丨(10), The door weight value 88 is located in the door interval 1~1〇〇", so the sound effect voice "a.mp3" corresponding to the gate interval is loaded for playback. As shown in "Fig. 2", "Fig. 2 The flow chart of the singing method with the context sound effect of the present invention comprises the following steps: providing a song voice and a sound effect voice, wherein each sound effect voice corresponds to a threshold interval respectively (step 12 201025289 201); receiving the selection condition 'according to the selection condition Play one of the song voices (step 2G2); receive the singing voice 'and analyze the singing voice according to the voice calculus, then record the voice, turn the image, and generate the expression parameters after the video image is determined according to the ^ part calculation (step 2 〇 4); according to the song voice of the 'Judepu towel, the data analysis result produces a tune parameter (step 205); according to the loading condition, at least one of the emotional parameter, the expression parameter and the tune parameter is loaded - to calculate the threshold Value (step hire); compares the threshold with the threshold interval, and plays the sound effect corresponding to the corresponding interval in the corresponding interval (step 2〇7). Through the above steps, you can calculate the door weight value by analyzing the singing voice, video image and song tunes, and compare the correction value with the sound effect voice _ _ façade, in order to match the sound of the singer and the thief. Voice, to improve the entertainment of singing. The following description will be made by way of example with reference to "Fig. 3" to "figure 5". As shown in Fig. 3, "Fig. 3" is a schematic diagram II for selecting a song voice storage for the application of the present invention. When the singer wants to sing, he can first query the song voice provided by the song database 101 through the track display block 302 in the song selection interface 300, and input the song voice to be sung through the track number input 301. After the number, the selection start element 303 determines to perform the singing 'this time' the singing module 1〇2 receives the input musical number as a selection condition, and plays the corresponding song voice according to the selection condition. In addition, the user can also replay the played song voice by clicking the re-singing component 304 to achieve the purpose of re-singing. 13 201025289 In the above, please refer to "Figure 4" and "Figure 4" for a schematic diagram of singing using the present invention. After the user selects the song voice, the song selection interface 300 is switched to the singing interface, and the song voice is played in the video display block 410. At this time, the user can sing through the radio device no (for example, a microphone). The voice device 11G can transmit the voice of the user to the voice analysis module 103. Then, the voice analysis module (10) receives the voice as the voice of the voice from the sound pickup device 110, and analyzes the voice of the voice according to the f-tone calculation. In order to generate emotional parameters, for example, 'When the voice of the song is sung (that is, the voice of the song) is long and sorrowful and sorrowful, the voice is composed of the voice group (10) Na Yunxian County analyzes the song; «-曰, and this singing voice Quantify to generate emotional parameters, such as: Value "20" 'In this case, the lower the parameter value of the emotional parameter, the more sorrowful the emotion representing the user. In addition, in addition to the above-mentioned sound receiving device 110 receiving the singing voice of the user, the photographing device ln further captures the user's real-time image and transmits the instant image to the image recognition module 1G4, and then the image recognition image 104 captures the image. The real-time image captured by the device 111 is used as a video image, and the facial expression is used to recognize the expression of the video and the expression parameter is generated. In actual implementation, the user's various expressions can be preset. ^如: happy, sad, angry , shock tf, disgust and fear, etc., used to improve the accuracy of facial expression calculation. In addition, the video image can also be displayed in the video display block 411 to facilitate the user to know his own expression 'and the video display block 411 can be hidden or displayed by clicking the video component 201025289 412. . In the speech analysis module 103 and the image recognition module 104, the tune analysis module 1〇5 also analyzes the song voice in the play according to the spectral difference to generate the _ parameter, for example. In other words, the two = tone is a lyric song, and the tune analysis module 105 is used to generate a tune parameter according to the melody of the music (eg ▲ • 曰 color) according to the way of the spectrum play, such as: 〇", as an example, the value of the tune parameter corresponding to the soft song voice will be smaller than the parameter value of the tune parameter corresponding to the impassioned song voice. The processing module 106 loads the above-mentioned generated emotional parameter "2〇", expression parameter "], and tune parameter "10" according to a preset loading condition, such as a number, a value of 7, to calculate a value, such as: "88, the way it is calculated can be achieved by four arithmetic operations or by Fourier transform. In actual redundancy, the user can also open the setting window of the manned condition by pressing the function key, such as pressing the dust keyboard and the key s (the towel is not shown), for setting the loading condition, for example, When the user is accustomed to being expressionless when singing, the loading condition can be set to a value of "5, (ie, loading emotional parameters and tune parameters)" so that the processing module 106 is based on the set loading conditions. Only the emotional parameter "2〇" and the tune parameter "i〇" are loaded to calculate the homing value 'because the reference value of the expression parameter is lower in the case of no expression", it is not suitable as a parameter for calculating the threshold value. After the threshold value calculation is completed, the sound effect module 107 compares the threshold value with the threshold interval of each sound effect voice in the song database 101, and 15 201025289 according to the = result from the song database 101 carrying the corresponding door Wei Wei The sound effect is played. For example, if the threshold value is “88” and the right sound voice “a.mP3, the corresponding threshold interval is “1~100”, the threshold value is in the threshold interval. Therefore The result is a match, and if the door is not in the range of the value "1" to the value "100", the comparison result is non-conformity, since the comparison result of this example is consistent, therefore, from the song library 101 Load the corresponding sound effect voice "a.mp3", and 10 through the speaker device, such as: ♦ eight, play. In addition, when the user does not want to continue singing, the song voice can be stopped by clicking the return component 413, and the song selection interface 3〇〇 is returned from the current singing interface 4〇〇. In addition, the above-mentioned judged 绪 parameter, expression parameter, and _ parameter, and the manner of receiving the word may be displayed in the situation display block 414'. For example, the emotion parameter is assumed to be the value ''2〇', When the expression parameter is the value "1" and the butterfly parameter is "10", the text is "sad" and "soft". In particular, the ® is the value of the three parameters of the above example and the text represented is for illustrative purposes only, and the present invention does not limit the expression of these three parameters. ^ In addition, as shown in Figure 5, Figure 5 is a schematic diagram of the application of the _ (4) threshold. As mentioned above, each sound effect voice corresponds to a *-door pinch interval. In actual implementation, the corresponding mode can be achieved through the comparison table to achieve the relationship between the door pinch interval and the sound effect voice by "for example". 'And this gate insertion section can be set by setting the interface interface'. The setting method can be set through the gate interval setting block 51 to set the value range of the threshold 16 201025289, for example, the value is "101~200", and the corresponding The file name "b.wav" of the sound effect voice, and after the setting is completed, click the determining component 520 to store the setting 'or click the cancel component 53 to cancel the setting. In particular, when the file name of the sound effect voice is called 'b.wav', the path of the file can be entered before the file name. In summary, it can be seen that the difference between the present invention and the prior art is that the singer value is calculated by analyzing the singing voice, the video image and the song melody, and φ is compared with the threshold interval corresponding to the sound effect voice. In order to load and play the appropriate sound effect sound according to the comparison result, the technical method can play the appropriate sound effect voice in different situations, thereby solving the problems existing in the advanced technology, thereby achieving the improvement of singing entertainment. Technical efficiency. Although the present invention has been disclosed above in the foregoing embodiments, it is not intended to limit the present invention, and the spirit and scope of the invention may be __, 耻本 ❿ 护护护护护护护护护护护The scope of the patent application attached to this manual shall prevail. F simple description of the diagram] The block diagram of the contextual sound effect system. • The flow chart of the shirt _. The circle. The third _ use the (4) selected voice Figure 4 is a schematic diagram of the application of the present invention for singing. 17 201025289 Figure 5 is a schematic diagram of setting a threshold interval by applying the present invention. [Key element symbol description] 101 song database 102 singing module 103 voice analysis module 104 Image recognition module 105 tune analysis module 106 processing module 107 sound effect module 110 radio device 111 photography device 300 song selection interface 301 track number input block 302 track display block 303 start component 304 re-singing component 400 singing interface 410 video display Block 411 video display block 412 video element 413 return element 414 context display block 500 setting interface 18 51 threshold interval setting block 520 determines component 530 cancellation The step 201 provides at least a song voice and at least one sound effect voice, wherein each of the sound effect voices respectively corresponds to a threshold interval step 202 receiving a selection condition 'and playing one of the song voices according to the selection condition. Step 2〇3 receiving-singing Voice, and according to a speech calculus analysis, the singer sui generates an emotional parameter step 204 captures the video image, and recognizes the video image according to a facial calculus to generate an expression parameter. Step 205 analyzes the playback according to a spectrum calculation. The song is uttered, and a tune parameter is generated according to the analysis result. Step 206 loads at least one of the emotion parameter, the expression parameter and the tune parameter according to a loading condition to calculate a threshold value 207. The threshold interval is compared, and the sound effect speech corresponding to the threshold interval is played according to the comparison result.

Claims

201025289 X. Patent application scope: 1. - A singing system with contextual sound effects, including: a song library "Lai Ling JL less" - song voice and at least one sound effect voice, wherein each sound effect voice corresponds to a threshold interval; a singing module for receiving a selection condition and playing one of the song voices according to the selection condition; a speech analysis module for receiving a singing voice and analyzing the singing voice according to a speech calculus - An image recognition module is configured to capture a video image and generate an expression parameter by recognizing the video image according to a facial algorithm; a tune analysis module configured to analyze the playback according to a spectrum calculation a song voice, and generating a tune parameter according to the analysis result; a processing module, configured to load at least one of the emotion parameter, the expression parameter and the tune parameter according to a loading condition to calculate a threshold; and a sound effect a module for comparing the gatecast value with the gate pinch interval, and loading the corresponding threshold interval according to the comparison result The sound effect is played. 2. The singing system with contextual sounds as described in item 1 of the patent application, wherein the door interval is a range of values and characters. 3. The vocal system of the contextual sound effect described in claim </ RTI> </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; 4. A singing system having a situational sound effect as recited in claim 1, wherein the emotional parameter, the expression parameter, and the tune parameter are generated in a continuous or spaced manner. 5. The singing system of the contextual sound effect described in claim 1, wherein the five-voice analysis module further comprises: after receiving the singing voice, filtering the background sound of the singing voice according to a filtering parameter. 6. A method for singing a situational sound effect, the method comprising: providing at least one song voice and at least one sound effect voice, wherein each of the sound effects voices respectively correspond to a gantry interval; receiving a selection condition, and playing the selection according to the selection condition One of the song voices; receiving a singing voice, and analyzing the singing voice according to a voice calculus to generate an emotional parameter; φ capturing a video image, and identifying the video image according to a facial algorithm to generate an expression parameter; Spectrum calculus analyzes the song voice in the play, and generates a tune parameter according to the analysis result; according to the manned condition manned gambling button, the expression parameter and the tune parameter at least one of them to calculate a pinch value; The gate interpolation is compared with the threshold interval, and the sound effect voice is played according to the comparison result loaded in the corresponding threshold interval. 7. The singing method with contextual sounds as described in claim 6 wherein the gate interval is a range of values and characters. & A singing method with contextual sound as described in claim 6 wherein the singing voice is received through a radio device and the video image is captured by a photographing device. 9. The sing-song method of the contextual syllabus of claim 6 wherein the emotional parameter, the expression parameter, and the tune parameter are generated in a continuous or spaced manner. 10. If the singer method of the contextual sound effect described in the _6 item is applied, the background sound of the singing voice according to a filtering parameter is further included.

twenty two